Bot Detection and Anti-Collusion Algorithms
At 02:43 a.m., the graph went flat. Traffic kept coming in like a clock. Pages per session stuck at 1.1. Scroll depth did not move. The checkout click came at near the same second for hundreds of users. Support had no reports. Ads were quiet. Was it a bot wave? A click farm? Or a small ring that learned the same steps and moved as one?
We paused block rules. We saved all raw logs. We tagged the hour for deep review. Then we ran a short drill: what would prove a bot? what would prove a collusive crew? and what tools would hold up next time? This guide is the cleaned set of notes from that night and many like it. It leans on field facts, not hype. It aims to help teams build a stack that catches abuse without hurting real users.
A short detour before we define things
Words can mislead. A “bot” is not just a headless script. It can be a full browser with human help. It can be replay of past moves. It can be a script with a real person in the loop to solve hard steps. “Collusion” is not only price-fixing in big markets. It shows up in ratings, reviews, bonus abuse, and play that is too in sync to be random. Some rings are all human. Some are all code. Many are mixed. We use clear signals and context to tell them apart.
The threat, by the numbers
Bad traffic is not rare or small. Each year, new reports show that a large share of web hits are non-human. For a wide view on traffic quality and bot types, see the latest Bad Bot Report from Imperva. It tracks volumes, targets, and how bots try to look like real users. You can read it here: latest Bad Bot Report.
Edge networks also share deep data on abuse waves, DDoS, and scripted flows. For a long‑term lens on web attacks and automation, the “State of the Internet” series from Akamai is useful. It shows shifts in methods, sectors hit, and regional trends: Akamai’s State of the Internet security report. These sources show a clear path: bots are smarter, and abuse rings are more patient. The risk is strong for e‑commerce, media, fintech, and iGaming, where cash or perks move fast.
Lab notes: signals we trust (and those we do not)
We sort signals in layers: network, client, behavior, graph, and simple money cues. We also keep a page on common attack types. A good, public map is the OWASP Automated Threats taxonomy (OAT). It helps teams name the thing they see.
On the vendor side, it helps to read how large CDNs do it. Docs like Cloudflare Bot Management docs show how scores and signals can mix. For high‑level basics on botnets and how they spread, see the CISA overview of botnets. These are not “buy me” pages; they are good primers.
| IP reputation and ASN heuristics | Simple scrapers, old botnets, data‑center bursts | Fast, cheap, easy to cache | False flags on VPN, mobile NAT; easy to rotate | Low privacy cost | Separate mobile carrier NATs; whitelist corp ranges; use decay on bad scores |
| TLS/JA3/JA4 handshake patterns | Headless tools, custom stacks, odd clients | Harder to fake at scale; good early filter | Can drift with browser updates; spoof kits exist | Low privacy cost | Version‑pin known good sets; alert on rare combos, not every mismatch |
| Device fingerprint (canvas, fonts, WebGL) | Farm tools, replayed sessions | Strong entropy; links accounts and devices | Privacy risk; legal limits; spoof kits; churn on browser hardening | Medium to high | Hash only; rotate salts; allow opt‑out; do not block on single score |
| Behavioral biometrics (micro‑mouse, dwell variance) | Scripted flows, replay, bots with steady pace | Works across IP churn; adds depth | Accessibility users can look “odd”; farms can replay traces | Medium | Exclude screen readers from strict rules; model per locale/device type |
| Session sequence models (simple Markov / RNN‑lite) | Macro scripts, coupon runs, rinse‑repeat flows | Captures order, not just points | Needs data; may break on new UI | Low | Retrain after big UX changes; use holdout per channel |
| Graph clustering (accounts / IPs / devices) | Collusion rings, multi‑account bonus abuse | Finds groups, not just bad users; robust to noise | VPN hubs look like hubs; heavy compute | Low | Down‑weight shared infra; use edge types and time windows |
| Challenge‑response (silent, interactive) | Known bots; low‑skill farms | Clear step up; easy to tune | Annoying; farms can pass; harms UX if overused | Medium | Use scores and risk; avoid walling off whole flows; A/B test |
| Attestation / Private State Tokens | Spoofed clients; unknown device trust | Signals from the platform; hard to fake | Browser support varies; still new | Low to medium | Gate only high‑risk steps; track benefit vs. drop‑off |
| WebAuthn / passkeys (for key actions) | Account takeovers, farmed logins | Strong auth; stops many replay attacks | Adoption cost; device lock‑in risks | Low | Use for cashout, admin, API keys; not for first visit |
| Economic signals (bonus use, cart churn on friction) | Incentive abuse, fake carts, coupon rings | Tied to real cost; hard to fake long term | Slow to learn; can punish edge cases | Low | Track LTV vs. friction; watch for “gaming” of promos |
Algorithms that survive contact with real attackers
Clean labels are rare. Logs lie. Attackers learn your rules. So the stack should not lean on one perfect model. Use a mix:
- Supervised models, but trained with noise‑aware loss and strong, time‑based validation.
- Semi‑supervised steps to learn structure from unlabeled flows.
- Graph methods to spot groups and shared tools.
- Simple sequence models to see order, pauses, and repeats.
- Risk‑based friction to add checks only when risk is high.
For group defense ideas on large graphs, early work like SybilGuard and graph‑based defenses still gives core insight: use social or event links to bound attack spread. For patterns in review or rating abuse, see research on group‑based collusion detection in reviews. The goal is not to copy old code. It is to learn stable ideas: look at the group, the timing, and the path they take.
Set a “friction ladder.” Start with silent checks (IP/JA3, light behavior). Move to soft prompts (email verify, one‑tap proof). Use strong auth only for high‑risk steps (cashout, admin). This keeps real users safe and keeps farms from scaling.
Anti‑collusion, not just anti‑bot
Collusion is when two or more users act as one to gain unfair edge. It shows up in ratings, auction bids, promo hunts, and game play. Signs include shared timing, shared devices, shared payment rails, and moves that look too neat to be random. It can also look like real fandoms or flash sales, so context is key.
On the policy side, there is work on how algorithms may aid or miss collusion. A good roundup is the UK CMA’s paper on pricing and tacit collusion in algorithmic markets: CMA research on pricing algorithms and collusion. For user trust areas like reviews and claims, US rules on truth in ads and reviews also apply. See the FTC Endorsement Guides on deceptive reviews. Your system should flag rings, but your policy should say what you do next, and where you draw the line.
Red‑team playbook (the things they try)
- Human‑in‑the‑loop farms that click through light puzzles.
- Residential proxy rotation to fake “new” users from home IP space.
- Replayed mouse and key traces that mimic “warm” hands.
- Time‑warped flows that shift delays by small random steps.
- Graph padding: add noise accounts to hide the core ring.
Counter this with multi‑layer checks, time windows, random spot tests, and group‑level limits. Always log reasons for key blocks. This helps appeals and model fixes.
When it breaks in the wild: the iGaming case
Games that mix skill, chance, and cash face sharp abuse. Common patterns: multi‑account bonus runs, ring play in tournaments, chip dump, and fake review boosts. Graphs help a lot here. Cluster by device, IP, payment token, and travel path. Look at time gaps between sign‑up, first spin, first cash in, and first cash out. Look for “copy‑paste” play styles at strange hours.
In this space, shared rules and clear words help both players and sites. Independent review hubs build that trust. For example, a resource like bonus-code-canada.com can help users learn license tiers, bonus terms, and fair play tips. That kind of guide also helps teams name abuse patterns, write plain rules, and set better UX for checks. Clear vocab cuts noise in tickets and makes it harder for rings to hide behind fine print.
If ad fraud or affiliate abuse touch your funnel, simple standards can reduce risk. Ad tech groups push tools like IAB Tech Lab’s ads.txt to mark who may sell your ads. It is basic, but it closes one door for fake traffic deals that feed bot runs into your site.
Build vs. buy, and the hidden costs
Build in‑house if you have unique data, fast log tools, and staff for 24/7 changes. You also need an abuse PM, a data lead, and on‑call SRE help. Buy if you lack depth, need global intel now, or face short term spikes (sales, launches). But check costs you do not see at first: data egress, privacy reviews, contract lock‑ins, and tuning time. Test vendors with bad‑day drills: give them a past attack with labels and ask what they would have done in the first hour, day, and week.
Metrics that matter (and the ones that lie)
Do not chase one number. Track:
- Business health: retention, GMV, bonus burn, chargebacks, and LTV after friction.
- Model health: false reject after challenges, false accept on known bad, and drift by channel.
- Ops health: review queue size, SLA to act on spikes, and mean time to un‑block a good user.
- UX cost: latency budget, step count to finish key tasks, and task drop‑off by segment.
Add strong auth only where it pays off. For key actions, modern standards like W3C WebAuthn Level 2 and passkeys can stop a lot of ATO. For softer gates, tune scores from tools like reCAPTCHA v3 guidelines and review them often. Browser‑side trust signals like Private State Tokens can add non‑fingerprint proof with lower privacy cost.
Failure diary: common breaks and fast fixes
- VPN heavy regions flagged: segment rules by region; add allow‑lists for known ISPs and corp NAT.
- Screen reader users blocked: detect accessibility tools; drop harsh behavior checks for them.
- Promo launch floods model: freeze “learning” for 48 hours; backfill with hand labels; use prior.
- IP lists go stale: add decay; re‑score weekly; avoid hard blocks on single IP data.
- Farm passes puzzles: rotate puzzle types; rate‑limit by account age; step up to WebAuthn on cashout.
30‑day hardening plan
Week 1: see and log
- Turn on full request logging for a safe sample (say 10%).
- Add TLS/JA3 and simple client hints to logs.
- Set dashboards for session flow, repeat device rate, and challenge pass rate.
Week 2: quick wins
- Deploy IP/ASN rules with decay and region splits.
- Add silent checks on sign‑up and first cash action.
- Ship a soft prompt on high‑risk triggers (rapid retries, coupon storms).
Week 3: group view
- Build a small graph: nodes for user, device, IP, card hash; edges with time.
- Flag tight clusters that share short time gaps and same flows.
- Send top 20 clusters to manual review; write rules from findings.
Week 4: strong gates and review
- Enable passkeys for admin and cashout.
- Add “appeal pipe” for false blocks; show clear reasons to users.
- Run a live drill: one hour sim of a bot wave; measure time to detect and act.
Build a stack that respects users
Good defense should be quiet for good users. Keep data small. Hash where you can. Allow opt‑outs where safe. Publish your policy. Tell users why you may add a step, and give them a fast path to fix errors. This builds trust and reduces support load.
FAQ
How is anti‑collusion different from generic fraud checks?
Fraud checks often score one user or one event. Anti‑collusion looks at groups and timing. It asks: who moves with who, on what path, and how often?
Do graph methods scale without over‑flagging VPN users?
Yes, if you use context. Down‑weight edges from known shared infra. Add time windows. Mix in device and flow hints. Do not block on graph shape alone.
What is a fair friction budget for sign‑ups vs. cashout?
Keep sign‑ups near zero friction, with silent checks. Add soft prompts on risk. Use strong auth at cashout or admin, where the value is high.
Are reCAPTCHA scores still useful?
Yes, as one signal. Tune by route, watch drift, and avoid hard walls. Mix with behavior and graph hints.
How do we validate a win if bots also make real buys?
Look at sequence and group context. Bots that buy still leave odd trails: repeat gifts, same device across many users, or fast reuse of promo codes.
Sources to keep on your desk
- Imperva Bad Bot Report
- Akamai State of the Internet
- OWASP OAT
- Cloudflare Bot Management docs
- CISA: Understanding Botnets
- SybilGuard (ACM)
- Group collusion in reviews (arXiv)
- CMA on pricing algorithms
- FTC Endorsement Guides
- IAB Tech Lab: ads.txt
- W3C WebAuthn Level 2
- reCAPTCHA v3 docs
- Private State Tokens
Author and note
This article shares field notes from work on web abuse, bot defense, and collusion risk. It is for information only and is not legal advice. Check local laws and your own policy before you act on user data.
Write a Hosted Exchange Review for ASP-One Announces Availability of a New 500MB Exchange Hosting Plan