The China Mail - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.672504
AFN 64.50369
ALL 81.278204
AMD 377.023001
ANG 1.790006
AOA 916.999722
ARS 1397.000125
AUD 1.414337
AWG 1.8025
AZN 1.677673
BAM 1.648148
BBD 2.017081
BDT 122.486127
BGN 1.649135
BHD 0.377107
BIF 2968.655855
BMD 1
BND 1.262698
BOB 6.920205
BRL 5.213301
BSD 1.001462
BTN 90.766139
BWP 13.130917
BYN 2.871071
BYR 19600
BZD 2.014216
CAD 1.362305
CDF 2239.999941
CHF 0.770226
CLF 0.021701
CLP 856.880125
CNY 6.90065
CNH 6.904075
COP 3669.44
CRC 488.174843
CUC 1
CUP 26.5
CVE 92.919683
CZK 20.43865
DJF 178.340138
DKK 6.29764
DOP 62.789414
DZD 129.649058
EGP 46.8767
ERN 15
ETB 155.91814
EUR 0.84308
FJD 2.1911
FKP 0.732521
GBP 0.734975
GEL 2.689541
GGP 0.732521
GHS 10.981149
GIP 0.732521
GMD 73.495387
GNF 8791.097665
GTQ 7.681191
GYD 209.527501
HKD 7.81609
HNL 26.465768
HRK 6.352993
HTG 131.140634
HUF 319.568036
IDR 16839.6
ILS 3.07333
IMP 0.732521
INR 90.72425
IQD 1311.996225
IRR 42125.000158
ISK 122.419858
JEP 0.732521
JMD 156.446849
JOD 0.709044
JPY 153.241999
KES 129.189681
KGS 87.449783
KHR 4029.780941
KMF 416.000205
KPW 899.988812
KRW 1443.909919
KWD 0.306698
KYD 0.834608
KZT 495.523168
LAK 21477.839154
LBP 89535.074749
LKR 309.834705
LRD 186.775543
LSL 15.890668
LTL 2.95274
LVL 0.60489
LYD 6.316863
MAD 9.145255
MDL 16.970249
MGA 4422.478121
MKD 51.943893
MMK 2100.304757
MNT 3579.516219
MOP 8.064618
MRU 39.97927
MUR 45.890035
MVR 15.449992
MWK 1736.631653
MXN 17.2182
MYR 3.895496
MZN 63.903343
NAD 15.890668
NGN 1355.580091
NIO 36.851175
NOK 9.558604
NPR 145.225485
NZD 1.659215
OMR 0.384624
PAB 1.001546
PEN 3.360847
PGK 4.298602
PHP 58.019498
PKR 280.142837
PLN 3.552955
PYG 6594.110385
QAR 3.650023
RON 4.292801
RSD 98.892905
RUB 77.275824
RWF 1462.164975
SAR 3.750858
SBD 8.038668
SCR 13.820244
SDG 601.498187
SEK 8.94247
SGD 1.263799
SHP 0.750259
SLE 24.449722
SLL 20969.49913
SOS 571.349117
SRD 37.779031
STD 20697.981008
STN 20.646096
SVC 8.763215
SYP 11059.574895
SZL 15.897494
THB 31.13699
TJS 9.42903
TMT 3.51
TND 2.88801
TOP 2.40776
TRY 43.737675
TTD 6.78456
TWD 31.4317
TZS 2570.000247
UAH 43.076943
UGX 3545.214761
UYU 38.401739
UZS 12328.669001
VES 389.80653
VND 25970
VUV 119.359605
WST 2.711523
XAF 552.773529
XAG 0.013064
XAU 0.000202
XCD 2.70255
XCG 1.804974
XDR 0.687473
XOF 552.773529
XPF 100.500141
YER 238.325007
ZAR 15.997635
ZMK 9001.204543
ZMW 18.578116
ZWL 321.999592
  • BCE

    0.1800

    25.83

    +0.7%

  • NGG

    0.5800

    91.22

    +0.64%

  • GSK

    0.0500

    58.54

    +0.09%

  • CMSD

    -0.1280

    23.942

    -0.53%

  • JRI

    0.0300

    13.16

    +0.23%

  • RBGPF

    0.1000

    82.5

    +0.12%

  • BTI

    0.2800

    60.61

    +0.46%

  • CMSC

    0.0000

    23.7

    0%

  • BCC

    -1.3500

    88.06

    -1.53%

  • RIO

    -1.6100

    97.91

    -1.64%

  • AZN

    -0.2400

    204.52

    -0.12%

  • RYCEF

    -0.0600

    16.87

    -0.36%

  • VOD

    -0.0600

    15.62

    -0.38%

  • BP

    -1.3600

    37.19

    -3.66%

  • RELX

    1.0800

    28.81

    +3.75%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

P.Ho--ThChM