The China Mail - As AI data scrapers sap websites' revenues, some fight back

USD -
AED 3.672504
AFN 63.000368
ALL 82.732897
AMD 367.370222
ANG 1.790403
AOA 917.000367
ARS 1478.086972
AUD 1.450326
AWG 1.80125
AZN 1.70397
BAM 1.716442
BBD 2.015885
BDT 123.112028
BGN 1.69088
BHD 0.377375
BIF 2972.662249
BMD 1
BND 1.295099
BOB 6.916495
BRL 5.177041
BSD 1.000921
BTN 93.946202
BWP 13.602176
BYN 2.902892
BYR 19600
BZD 2.012989
CAD 1.41895
CDF 2267.50392
CHF 0.80956
CLF 0.023471
CLP 922.497696
CNY 6.79815
CNH 6.804685
COP 3438.325508
CRC 454.429769
CUC 1
CUP 26.5
CVE 96.770372
CZK 21.30904
DJF 178.235113
DKK 6.565804
DOP 58.809075
DZD 133.424898
EGP 49.530036
ERN 15
ETB 161.36601
EUR 0.877704
FJD 2.266104
FKP 0.757679
GBP 0.757518
GEL 2.64504
GGP 0.757679
GHS 11.285269
GIP 0.757679
GMD 73.000355
GNF 8770.020624
GTQ 7.63614
GYD 209.469481
HKD 7.84255
HNL 26.780464
HRK 6.617804
HTG 130.8175
HUF 310.850388
IDR 17860.6
ILS 3.00205
IMP 0.757679
INR 94.360504
IQD 1311.158892
IRR 1375250.000352
ISK 126.490386
JEP 0.757679
JMD 157.637457
JOD 0.70904
JPY 161.75504
KES 129.518627
KGS 87.450384
KHR 4017.727851
KMF 434.00035
KPW 900.00035
KRW 1535.290383
KWD 0.30961
KYD 0.834087
KZT 485.637808
LAK 21969.371188
LBP 89630.523498
LKR 336.443021
LRD 182.31603
LSL 16.452675
LTL 2.95274
LVL 0.60489
LYD 6.42503
MAD 9.385493
MDL 17.746281
MGA 4233.621484
MKD 54.091886
MMK 2099.260826
MNT 3579.633879
MOP 8.085217
MRU 39.945588
MUR 47.250378
MVR 15.450378
MWK 1735.574181
MXN 17.504204
MYR 4.088039
MZN 63.903729
NAD 16.452675
NGN 1376.130377
NIO 36.83356
NOK 9.933039
NPR 150.313748
NZD 1.771166
OMR 0.384504
PAB 1.000921
PEN 3.41305
PGK 4.39247
PHP 61.312038
PKR 278.550353
PLN 3.76695
PYG 6109.087718
QAR 3.648427
RON 4.603104
RSD 103.014612
RUB 78.910966
RWF 1465.794901
SAR 3.758743
SBD 8.051953
SCR 14.057835
SDG 600.000339
SEK 9.73761
SGD 1.294204
SHP 0.746601
SLE 24.803667
SLL 20969.503664
SOS 572.030366
SRD 37.483038
STD 20697.981008
STN 21.501602
SVC 8.757734
SYP 110.532098
SZL 16.443021
THB 33.378038
TJS 9.263329
TMT 3.5
TND 2.966607
TOP 2.40776
TRY 46.553304
TTD 6.802405
TWD 31.859804
TZS 2632.322612
UAH 44.926675
UGX 3673.702225
UYU 40.177279
UZS 12022.46698
VES 620.752985
VND 26300
VUV 119.209429
WST 2.780882
XAF 575.678617
XAG 0.017058
XAU 0.000246
XCD 2.70255
XCG 1.803853
XDR 0.715959
XOF 575.678617
XPF 104.664531
YER 238.625037
ZAR 16.987795
ZMK 9001.203584
ZMW 18.029751
ZWL 321.999592
  • CMSC

    -0.1160

    21.93

    -0.53%

  • NGG

    -0.4100

    83.01

    -0.49%

  • GSK

    0.6100

    52.5

    +1.16%

  • RIO

    -1.3700

    93.74

    -1.46%

  • CMSD

    -0.1600

    21.77

    -0.73%

  • BCE

    -0.2800

    22.92

    -1.22%

  • RBGPF

    3.7000

    65

    +5.69%

  • BP

    -0.5900

    37.13

    -1.59%

  • AZN

    2.7300

    188.41

    +1.45%

  • RYCEF

    0.3900

    18.39

    +2.12%

  • VOD

    0.0300

    13.89

    +0.22%

  • BCC

    1.2600

    81.02

    +1.56%

  • RELX

    0.4200

    31.34

    +1.34%

  • JRI

    0.2100

    12.79

    +1.64%

  • BTI

    0.2800

    62.76

    +0.45%

As AI data scrapers sap websites' revenues, some fight back
As AI data scrapers sap websites' revenues, some fight back / Photo: © AFP

As AI data scrapers sap websites' revenues, some fight back

A swarm of AI "crawlers" is running rampant on the internet, scouring billions of websites for data to feed algorithms at leading tech companies -- all without permission or payment, upending the online economy.

Text size:

Before the rise of AI chatbots, websites allowed search engines to access their content in return for increased visibility, a system that rewarded them with traffic and advertising revenues.

But the rapid development of generative AI has allowed tech giants like Google and OpenAI to harvest information for their chatbots with web crawlers, without humans ever needing to visit the original sites.

Traditional content producers, such as media outlets, are being outpaced by AI crawlers, which have cut into their online operations and advertising revenues.

"Sites that gave bots access to their content used to get readers in exchange," said Kurt Muehmel, head of AI strategy at data management firm Dataiku.

But the arrival of generative AI "completely breaks" that model, he told AFP.

Wikipedia's human internet traffic fell by eight percent between 2024 and 2025 because of a rise in AI search engine summaries, the online encyclopaedia reported last month.

"The fundamental tension is that the new business of the internet that is AI-driven doesn't generate traffic," said Matthew Prince, CEO of Cloudflare, an American internet services provider.

- 'No trespassing' -

Cloudflare, which processes more than 20 percent of all internet traffic, announced this summer a new measure aimed at blocking AI crawlers from accessing content without payment or permission from website owners.

"It's basically like putting a speed limit sign or a no trespassing sign," Prince told AFP on the sidelines of the Web Summit in Lisbon.

"Badly behaving bots can get by that, but we can track that... Over time, we can tighten these controls in a way that we're confident the AI companies can't get through."

The measure, which applies to more than 10 million websites, has already "attracted the attention of artificial intelligence giants", he added.

On a smaller scale, American startup TollBit is providing online news publishers with tools to block, monitor and monetise AI crawler traffic.

"The internet is a highway," said CEO and co-founder Toshit Panigrahi, who described the company as a "tollbooth on the internet".

TollBit works with more than 5,600 sites, including USA Today, Time magazine and the Associated Press, allowing media outlets to set their own access fees for their content.

The analytics are free for publishers, but AI companies are charged a "transaction fee for every piece of content they access".

But for Muehmel, the online takeover by AI crawlers cannot be resolved with only "partial measures or by an individual company".

"This is an evolution of the entire internet economy, which will take years," he said.

If the bot swarm continues to roam freely online, "all of the incentives for content creation are going to go away," Prince said.

"That would be a loss, not just for us humans that want to consume it, but actually for the AI companies that need original content in order to train their systems."

X.Gu--ThChM