The China Mail - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.6725
AFN 63.49826
ALL 81.649957
AMD 368.209891
ANG 1.790403
AOA 917.503082
ARS 1436.737304
AUD 1.429756
AWG 1.8
AZN 1.699145
BAM 1.685177
BBD 2.015096
BDT 122.817901
BGN 1.69088
BHD 0.377104
BIF 2991
BMD 1
BND 1.281762
BOB 6.938712
BRL 5.090801
BSD 1.000526
BTN 94.560525
BWP 13.406112
BYN 2.76997
BYR 19600
BZD 2.012252
CAD 1.41566
CDF 2320.000121
CHF 0.808655
CLF 0.022506
CLP 885.759871
CNY 6.75745
CNH 6.796635
COP 3435
CRC 455.716489
CUC 1
CUP 26.5
CVE 95.350078
CZK 20.80205
DJF 177.719866
DKK 6.43614
DOP 58.599944
DZD 132.878973
EGP 49.908197
ERN 15
ETB 158.375021
EUR 0.875592
FJD 2.2337
FKP 0.746465
GBP 0.758987
GEL 2.644999
GGP 0.746465
GHS 11.2977
GIP 0.746465
GMD 72.999684
GNF 8777.499016
GTQ 7.626359
GYD 209.290102
HKD 7.83801
HNL 26.697197
HRK 6.596596
HTG 130.666299
HUF 300.649642
IDR 17748.6
ILS 2.954095
IMP 0.746465
INR 94.309498
IQD 1310
IRR 1374999.999942
ISK 124.330031
JEP 0.746465
JMD 158.238482
JOD 0.709019
JPY 160.262999
KES 129.520178
KGS 87.449762
KHR 4012.493065
KMF 424.999812
KPW 900.00035
KRW 1511.864997
KWD 0.308098
KYD 0.8338
KZT 487.920041
LAK 22029.999804
LBP 89550.000054
LKR 335.185855
LRD 182.14983
LSL 16.194858
LTL 2.95274
LVL 0.60489
LYD 6.37502
MAD 9.245017
MDL 17.459223
MGA 4199.999949
MKD 53.086638
MMK 2099.945791
MNT 3579.382153
MOP 8.072446
MRU 40.080045
MUR 47.130241
MVR 15.460244
MWK 1736.000257
MXN 17.39902
MYR 4.064804
MZN 63.902105
NAD 16.201917
NGN 1359.119651
NIO 36.6101
NOK 9.77045
NPR 151.295881
NZD 1.746328
OMR 0.384498
PAB 1.000526
PEN 3.41251
PGK 4.38775
PHP 60.373009
PKR 278.298187
PLN 3.64767
PYG 6105.515298
QAR 3.640502
RON 4.507036
RSD 101.071054
RUB 72.971546
RWF 1488
SAR 3.751894
SBD 8.061424
SCR 14.115123
SDG 600.499323
SEK 9.627603
SGD 1.28203
SHP 0.746601
SLE 24.750291
SLL 20969.503664
SOS 571.507527
SRD 37.332026
STD 20697.981008
STN 21.4
SVC 8.754244
SYP 110.532098
SZL 16.19688
THB 32.534501
TJS 9.274765
TMT 3.51
TND 2.91175
TOP 2.40776
TRY 46.44366
TTD 6.796543
TWD 31.558502
TZS 2625.00297
UAH 44.808889
UGX 3701.565583
UYU 40.393596
UZS 12004.999858
VES 596.036397
VND 26326
VUV 118.988901
WST 2.739751
XAF 565.192704
XAG 0.015738
XAU 0.000242
XCD 2.70255
XCG 1.803205
XDR 0.703697
XOF 565.000179
XPF 103.250281
YER 238.625025
ZAR 16.519225
ZMK 9001.202402
ZMW 17.684109
ZWL 321.999592
  • CMSC

    0.0500

    22.37

    +0.22%

  • NGG

    -1.2400

    79.44

    -1.56%

  • RBGPF

    -0.5300

    60.61

    -0.87%

  • GSK

    -1.4800

    50.67

    -2.92%

  • RIO

    -2.5900

    100.08

    -2.59%

  • BCC

    3.8500

    74.66

    +5.16%

  • BCE

    0.0000

    23.28

    0%

  • BTI

    -0.5800

    58.91

    -0.98%

  • CMSD

    0.0000

    22.29

    0%

  • AZN

    -2.9600

    174.93

    -1.69%

  • RYCEF

    -0.0300

    18.4

    -0.16%

  • VOD

    -0.2300

    14.3

    -1.61%

  • JRI

    0.0500

    12.67

    +0.39%

  • RELX

    -0.8300

    31.18

    -2.66%

  • BP

    -1.0400

    39.1

    -2.66%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

P.Ho--ThChM