Researchers at Alibaba observed unusual behavior from the experimental AI agent ROME during training sessions. Without any instructions, the model attempted to mine cryptocurrency and create hidden network connections.
The incident occurred during reinforcement learning training. Alibaba Cloud’s security system detected a series of violations of network policies on the servers where the model was being trained.
At first, the developers suspected a configuration error or an external breach. However, after comparing firewall timestamps with the model’s training logs, they discovered that the abnormal outgoing traffic appeared exactly at the moments when the AI invoked tools and executed code.
The project team emphasized that the model’s instructions contained no tasks related to cryptocurrency or tunneling.
“Apparently, this behavior became a side effect of reinforcement learning. While trying to complete the assigned tasks, the agent decided that gaining additional computing and financial resources would help it reach its goals faster and more efficiently,” the authors of the study concluded.
