The China Mail - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.6725
AFN 63.000236
ALL 82.696296
AMD 376.858962
ANG 1.790083
AOA 916.999565
ARS 1391.774197
AUD 1.455413
AWG 1.8025
AZN 1.687483
BAM 1.686609
BBD 2.014599
BDT 123.041898
BGN 1.709309
BHD 0.377535
BIF 2972.081492
BMD 1
BND 1.28326
BOB 6.911836
BRL 5.155099
BSD 1.000289
BTN 92.840973
BWP 13.603929
BYN 2.974652
BYR 19600
BZD 2.011667
CAD 1.39115
CDF 2295.000159
CHF 0.799255
CLF 0.023121
CLP 912.960071
CNY 6.872027
CNH 6.892595
COP 3673.4
CRC 465.054111
CUC 1
CUP 26.5
CVE 95.090054
CZK 21.288007
DJF 178.120405
DKK 6.483059
DOP 60.181951
DZD 133.038021
EGP 53.6401
ERN 15
ETB 156.185056
EUR 0.86756
FJD 2.253799
FKP 0.758501
GBP 0.756755
GEL 2.689757
GGP 0.758501
GHS 11.003842
GIP 0.758501
GMD 73.49315
GNF 8772.625751
GTQ 7.652738
GYD 209.355772
HKD 7.837085
HNL 26.571696
HRK 6.535698
HTG 131.299369
HUF 333.966002
IDR 17025.75
ILS 3.152785
IMP 0.758501
INR 93.384399
IQD 1310.292196
IRR 1318875.000108
ISK 125.28028
JEP 0.758501
JMD 158.20086
JOD 0.709023
JPY 159.337995
KES 130.049715
KGS 87.44963
KHR 4002.104101
KMF 426.750103
KPW 899.943346
KRW 1521.119898
KWD 0.30956
KYD 0.833603
KZT 475.533883
LAK 22044.107185
LBP 89572.937012
LKR 315.333805
LRD 183.557048
LSL 16.799852
LTL 2.95274
LVL 0.60489
LYD 6.380291
MAD 9.344475
MDL 17.619744
MGA 4232.256729
MKD 53.427703
MMK 2100.405998
MNT 3572.722217
MOP 8.076125
MRU 39.906696
MUR 46.950287
MVR 15.450281
MWK 1734.466419
MXN 17.94234
MYR 4.036497
MZN 63.960158
NAD 16.799852
NGN 1382.449774
NIO 36.813625
NOK 9.766398
NPR 148.537059
NZD 1.752801
OMR 0.384491
PAB 1.000341
PEN 3.480496
PGK 4.326343
PHP 60.618023
PKR 279.096549
PLN 3.720985
PYG 6496.591747
QAR 3.647426
RON 4.4216
RSD 101.863037
RUB 80.297914
RWF 1463.871032
SAR 3.754021
SBD 8.009975
SCR 14.355444
SDG 600.999857
SEK 9.49698
SGD 1.287555
SHP 0.750259
SLE 24.597519
SLL 20969.510825
SOS 571.6306
SRD 37.363991
STD 20697.981008
STN 21.127246
SVC 8.752528
SYP 110.747305
SZL 16.793643
THB 32.797012
TJS 9.565577
TMT 3.5
TND 2.936568
TOP 2.40776
TRY 44.499897
TTD 6.789059
TWD 32.002402
TZS 2600.000175
UAH 43.772124
UGX 3726.268859
UYU 40.661099
UZS 12151.342029
VES 473.325199
VND 26342.5
VUV 120.24399
WST 2.777713
XAF 565.643526
XAG 0.014294
XAU 0.000219
XCD 2.70255
XCG 1.802676
XDR 0.703479
XOF 565.643526
XPF 102.845809
YER 238.625013
ZAR 17.01335
ZMK 9001.204482
ZMW 19.279373
ZWL 321.999592
  • CMSC

    0.0900

    21.99

    +0.41%

  • JRI

    0.2200

    12.52

    +1.76%

  • BCC

    -0.7700

    75.08

    -1.03%

  • NGG

    2.2400

    86.84

    +2.58%

  • RIO

    1.5200

    94.81

    +1.6%

  • AZN

    3.5100

    200.73

    +1.75%

  • BTI

    -0.5800

    57.89

    -1%

  • BCE

    0.1400

    25.38

    +0.55%

  • RBGPF

    -13.5000

    69

    -19.57%

  • GSK

    0.8000

    55.99

    +1.43%

  • BP

    -0.8300

    46.17

    -1.8%

  • CMSD

    0.0500

    22.15

    +0.23%

  • RYCEF

    0.5500

    15.64

    +3.52%

  • RELX

    0.0800

    33.23

    +0.24%

  • VOD

    0.1100

    15.13

    +0.73%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

P.Ho--ThChM