The China Mail - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.672503
AFN 66.358865
ALL 83.521386
AMD 382.507047
ANG 1.789982
AOA 916.999942
ARS 1420.001095
AUD 1.532297
AWG 1.8075
AZN 1.700215
BAM 1.69102
BBD 2.013765
BDT 122.075429
BGN 1.69038
BHD 0.376985
BIF 2944.950242
BMD 1
BND 1.302709
BOB 6.934237
BRL 5.288594
BSD 0.999836
BTN 88.626912
BWP 13.379849
BYN 3.408468
BYR 19600
BZD 2.010825
CAD 1.402695
CDF 2507.503045
CHF 0.801795
CLF 0.023892
CLP 937.280025
CNY 7.11965
CNH 7.121545
COP 3768.72
CRC 501.990757
CUC 1
CUP 26.5
CVE 95.337115
CZK 20.97225
DJF 178.040619
DKK 6.453275
DOP 64.274876
DZD 130.334215
EGP 47.2332
ERN 15
ETB 153.531271
EUR 0.86414
FJD 2.2795
FKP 0.760151
GBP 0.76071
GEL 2.704944
GGP 0.760151
GHS 10.938284
GIP 0.760151
GMD 73.493505
GNF 8679.111511
GTQ 7.663975
GYD 209.177056
HKD 7.773075
HNL 26.305664
HRK 6.510503
HTG 130.902048
HUF 333.164946
IDR 16717.4
ILS 3.217055
IMP 0.760151
INR 88.53915
IQD 1309.809957
IRR 42112.502065
ISK 126.509901
JEP 0.760151
JMD 160.929279
JOD 0.709026
JPY 154.216503
KES 129.120362
KGS 87.449766
KHR 4015.251731
KMF 421.000542
KPW 899.978423
KRW 1464.569693
KWD 0.307097
KYD 0.833232
KZT 523.811582
LAK 21710.560445
LBP 89534.40718
LKR 304.034308
LRD 182.9689
LSL 17.183334
LTL 2.95274
LVL 0.604891
LYD 5.455693
MAD 9.256256
MDL 16.972307
MGA 4491.671602
MKD 53.199952
MMK 2099.547411
MNT 3580.914225
MOP 8.005153
MRU 39.702748
MUR 45.889881
MVR 15.405021
MWK 1733.71722
MXN 18.36573
MYR 4.138985
MZN 63.949746
NAD 17.183334
NGN 1437.069362
NIO 36.789182
NOK 10.08201
NPR 141.802446
NZD 1.770055
OMR 0.384485
PAB 0.999844
PEN 3.374604
PGK 4.221029
PHP 58.961021
PKR 282.700265
PLN 3.65467
PYG 7082.89022
QAR 3.644192
RON 4.393097
RSD 101.25215
RUB 81.322855
RWF 1453.231252
SAR 3.750481
SBD 8.237372
SCR 13.77609
SDG 600.496166
SEK 9.485902
SGD 1.30182
SHP 0.750259
SLE 23.194491
SLL 20969.499529
SOS 570.381162
SRD 38.496501
STD 20697.981008
STN 21.18296
SVC 8.748206
SYP 11056.693449
SZL 17.178084
THB 32.402502
TJS 9.263432
TMT 3.5
TND 2.951633
TOP 2.342104
TRY 42.23324
TTD 6.782064
TWD 31.013798
TZS 2450.602922
UAH 42.041441
UGX 3509.484861
UYU 39.780907
UZS 12013.003856
VES 230.803902
VND 26315
VUV 122.395188
WST 2.82323
XAF 567.14739
XAG 0.019568
XAU 0.000242
XCD 2.70255
XCG 1.801951
XDR 0.705352
XOF 567.14739
XPF 103.114354
YER 238.509303
ZAR 17.15325
ZMK 9001.201907
ZMW 22.620808
ZWL 321.999592
  • RIO

    0.9600

    70.29

    +1.37%

  • RBGPF

    0.0000

    76

    0%

  • CMSC

    0.0400

    23.89

    +0.17%

  • RYCEF

    0.0200

    14.82

    +0.13%

  • SCS

    -0.0200

    15.74

    -0.13%

  • CMSD

    0.0600

    24.16

    +0.25%

  • BCC

    -0.8100

    69.83

    -1.16%

  • NGG

    -0.4200

    77.33

    -0.54%

  • GSK

    0.7300

    47.36

    +1.54%

  • BCE

    -0.2500

    22.94

    -1.09%

  • JRI

    -0.0600

    13.68

    -0.44%

  • BTI

    0.8300

    55.42

    +1.5%

  • VOD

    0.1200

    11.7

    +1.03%

  • BP

    0.5400

    37.12

    +1.45%

  • RELX

    -0.2400

    42.03

    -0.57%

  • AZN

    2.9000

    87.48

    +3.32%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

P.Ho--ThChM