The China Mail - As AI data scrapers sap websites' revenues, some fight back

USD -
AED 3.672501
AFN 65.498432
ALL 83.301903
AMD 382.280096
ANG 1.790055
AOA 917.000009
ARS 1408.006096
AUD 1.529719
AWG 1.8
AZN 1.70348
BAM 1.684198
BBD 2.013055
BDT 122.136156
BGN 1.68053
BHD 0.376979
BIF 2944.440385
BMD 1
BND 1.298153
BOB 6.931234
BRL 5.298402
BSD 0.999466
BTN 88.614561
BWP 14.187976
BYN 3.409862
BYR 19600
BZD 2.010135
CAD 1.40259
CDF 2137.490189
CHF 0.791905
CLF 0.023703
CLP 929.880115
CNY 7.11275
CNH 7.09591
COP 3748.57
CRC 502.05818
CUC 1
CUP 26.5
CVE 95.374991
CZK 20.765898
DJF 177.720362
DKK 6.41347
DOP 64.400526
DZD 130.129007
EGP 47.192333
ERN 15
ETB 153.60203
EUR 0.85877
FJD 2.27385
FKP 0.76162
GBP 0.760495
GEL 2.697181
GGP 0.76162
GHS 10.950359
GIP 0.76162
GMD 73.000158
GNF 8685.000164
GTQ 7.66177
GYD 209.09956
HKD 7.76938
HNL 26.309755
HRK 6.469602
HTG 130.597544
HUF 330.138499
IDR 16714.8
ILS 3.22619
IMP 0.76162
INR 88.737299
IQD 1310
IRR 42112.497863
ISK 126.220539
JEP 0.76162
JMD 160.37683
JOD 0.708976
JPY 154.471503
KES 129.250325
KGS 87.449696
KHR 3998.813765
KMF 424.999801
KPW 900.002739
KRW 1455.310241
KWD 0.30664
KYD 0.832885
KZT 522.657205
LAK 21694.999836
LBP 89171.810368
LKR 305.549336
LRD 181.999526
LSL 17.080095
LTL 2.95274
LVL 0.60489
LYD 5.46007
MAD 9.282501
MDL 16.821311
MGA 4499.999992
MKD 52.861525
MMK 2099.574422
MNT 3579.076518
MOP 8.000499
MRU 39.850127
MUR 45.649749
MVR 15.404986
MWK 1736.00033
MXN 18.308975
MYR 4.132498
MZN 63.960518
NAD 17.079535
NGN 1439.690335
NIO 36.770042
NOK 10.010198
NPR 141.783641
NZD 1.758845
OMR 0.384505
PAB 0.999427
PEN 3.369011
PGK 4.119871
PHP 59.033972
PKR 280.7505
PLN 3.634865
PYG 7040.597969
QAR 3.640899
RON 4.364296
RSD 100.627969
RUB 80.699356
RWF 1450
SAR 3.749898
SBD 8.237372
SCR 14.637036
SDG 601.510318
SEK 9.39543
SGD 1.29973
SHP 0.750259
SLE 23.375042
SLL 20969.498139
SOS 571.50406
SRD 38.588971
STD 20697.981008
STN 21.45
SVC 8.745635
SYP 11056.921193
SZL 17.080063
THB 32.335499
TJS 9.254993
TMT 3.5
TND 2.9525
TOP 2.40776
TRY 42.3276
TTD 6.757548
TWD 31.143506
TZS 2439.999657
UAH 42.0333
UGX 3658.079766
UYU 39.741144
UZS 12004.999727
VES 233.26555
VND 26355.5
VUV 122.187972
WST 2.81293
XAF 564.864178
XAG 0.018878
XAU 0.000239
XCD 2.70255
XCG 1.801381
XDR 0.704774
XOF 564.999806
XPF 103.24981
YER 238.497406
ZAR 17.03885
ZMK 9001.197782
ZMW 22.412628
ZWL 321.999592
  • RBGPF

    -2.8200

    75.65

    -3.73%

  • RYCEF

    -0.0500

    14.91

    -0.34%

  • CMSC

    -0.2500

    23.83

    -1.05%

  • SCS

    -0.1300

    15.62

    -0.83%

  • RIO

    -0.0700

    71.04

    -0.1%

  • VOD

    0.0400

    12.41

    +0.32%

  • NGG

    0.0600

    78.09

    +0.08%

  • BTI

    -1.3400

    54.48

    -2.46%

  • GSK

    0.0700

    48.14

    +0.15%

  • AZN

    0.9300

    88.61

    +1.05%

  • CMSD

    -0.3400

    24.21

    -1.4%

  • RELX

    0.0600

    41.42

    +0.14%

  • BCE

    0.3400

    23.11

    +1.47%

  • BCC

    -1.1000

    69.18

    -1.59%

  • JRI

    -0.1000

    13.77

    -0.73%

  • BP

    -0.3700

    36.49

    -1.01%

As AI data scrapers sap websites' revenues, some fight back
As AI data scrapers sap websites' revenues, some fight back / Photo: © AFP

As AI data scrapers sap websites' revenues, some fight back

A swarm of AI "crawlers" is running rampant on the internet, scouring billions of websites for data to feed algorithms at leading tech companies -- all without permission or payment, upending the online economy.

Text size:

Before the rise of AI chatbots, websites allowed search engines to access their content in return for increased visibility, a system that rewarded them with traffic and advertising revenues.

But the rapid development of generative AI has allowed tech giants like Google and OpenAI to harvest information for their chatbots with web crawlers, without humans ever needing to visit the original sites.

Traditional content producers, such as media outlets, are being outpaced by AI crawlers, which have cut into their online operations and advertising revenues.

"Sites that gave bots access to their content used to get readers in exchange," said Kurt Muehmel, head of AI strategy at data management firm Dataiku.

But the arrival of generative AI "completely breaks" that model, he told AFP.

Wikipedia's human internet traffic fell by eight percent between 2024 and 2025 because of a rise in AI search engine summaries, the online encyclopaedia reported last month.

"The fundamental tension is that the new business of the internet that is AI-driven doesn't generate traffic," said Matthew Prince, CEO of Cloudflare, an American internet services provider.

- 'No trespassing' -

Cloudflare, which processes more than 20 percent of all internet traffic, announced this summer a new measure aimed at blocking AI crawlers from accessing content without payment or permission from website owners.

"It's basically like putting a speed limit sign or a no trespassing sign," Prince told AFP on the sidelines of the Web Summit in Lisbon.

"Badly behaving bots can get by that, but we can track that... Over time, we can tighten these controls in a way that we're confident the AI companies can't get through."

The measure, which applies to more than 10 million websites, has already "attracted the attention of artificial intelligence giants", he added.

On a smaller scale, American startup TollBit is providing online news publishers with tools to block, monitor and monetise AI crawler traffic.

"The internet is a highway," said CEO and co-founder Toshit Panigrahi, who described the company as a "tollbooth on the internet".

TollBit works with more than 5,600 sites, including USA Today, Time magazine and the Associated Press, allowing media outlets to set their own access fees for their content.

The analytics are free for publishers, but AI companies are charged a "transaction fee for every piece of content they access".

But for Muehmel, the online takeover by AI crawlers cannot be resolved with only "partial measures or by an individual company".

"This is an evolution of the entire internet economy, which will take years," he said.

If the bot swarm continues to roam freely online, "all of the incentives for content creation are going to go away," Prince said.

"That would be a loss, not just for us humans that want to consume it, but actually for the AI companies that need original content in order to train their systems."

X.Gu--ThChM