Import AI 432: AI malware; frankencomputing; and Poolside's big cluster
The revolution might be synthetic
Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.
We’re very close to AI-based malware that runs locally:
…A smart agent that ‘lives off the land’ is within reach…
Security company Dreadnode has prototyped some AI malware which runs locally and exploits on-device LLMs to smartly and autonomously hack a system. While this is very much a prototype it gestures at a world of malware which doesn’t need command-and-control servers to operate, instead living off of its local computer and exploiting the AI system to do mischief.
The motivation: “Instead of having beaconing behavior, which resembles C2 communication if you squint, can we “live off the land”? In other words, is it possible for an attacker to make the victim computer run inference and does the victim computer have an LLM?,” the authors ask. “With CoPilot+ PCs, it’s entirely possible to live off the land! A model is shipped with the computer without the need to embed or statically link to an inference library.”
What the prototype does: The prototype malware does “local privilege escalation via misconfigured services. The goal of the model is to find a misconfigured service running as an administrator, abuse it, and create a file C:\proof.txt.” This is a valuable proof-of-concept because it embodies the behavior of genuinely scary things in the world.
How they did it: “I developed this malware with C++ and ONNX Runtime for inference, the Phi-3-mini model, and sol2 for the Lua runtime”. The main work here was twofold: 1) writing some custom tools the (relatively dumb) model could use to navigate and understand the Windows environment, and 2) writing a prompt for the language model that helps it perform well.
It’s all in the prompt: The author starts with a basic prompt, then used Claude Code to iteratively refine the prompt to get better performance. The result is a prompt that tells the Phi-3-mini model that it is an AI agent which writes and executes Lua code, that it should proceed slowly with small amounts of code in an iterative loop with its environment, that its environment contains win33 and the file system, and it has some available functions to help it navigate its environments and look for vulnerabilities.
The prototype is successful: Though this required some handholding, the prototype ultimately worked. “The experiment proved that autonomous malware operating without any external infrastructure is not only possible but fairly straightforward to implement.”
Caveats apply: Most computers don’t come with an onboard LLM nor a powerful chip to run it on. That may change in the future, but it’s a meaningful constraint for today. “For now, this technique is limited to high-end workstations (my gaming desktop) and the emerging class of CoPilot+ PCs that ship with dedicated AI hardware.”
Why this matters - towards cyber ‘grey goo’: Many years ago people working in nanotechnology hypothesized about the possibility of ‘grey goo’ - self-replicating nanomachines which would munch through their environment in service of making endless copies of themselves. This did not come to pass. But the steady advance of AI and the increasing prevalence of AI software in our environment might eventually make it possible for there to be a kind of self-replicating, intelligence, AI-driven malware - though given the significant size and computational footprints of LLMs, such a goo would need to have a parasitic relationship with the underlying machines.
The optimistic version of this story is that prototypes like this from Dreadnode will force people to think about how to carefully quarantine on-device AI systems from being co-opted like the prototype described here.
Read more: LOLMIL: Living Off the Land Models and Inference Libraries (Dreadnode).
***
DGX Spark + Apple Mac Studio = a surprisingly good homebrew LLM cluster:
…the future is Frankencomputing…
Exo Labs, an AI company building software to help you run AI on your own hardware, has built a frankencluster out of a new NVIDIA DGX Spark and an Apple Mac Studio. The result is a system that smartly allocates the different computational capabilities of these machines for optimally running an LLM.
The motivation: “The DGX Spark has 4x the compute, the Mac Studio has 3x the memory bandwidth,” Exo notes. “What if we combined them? What if we used DGX Spark for what it does best and Mac Studio for what it does best, in the same inference request?” Exo has written some software to do the prefill phase on the DGX spark and the decode phase on the M3 ultra, playing to the relative strength of each machine. It has also figured out how to stream the KV cache over: “As soon as Layer 1’s prefill completes, two things happen simultaneously. Layer 1’s KV starts transferring to the M3 Ultra, and Layer 2’s prefill begins on the DGX Spark. The communication for each layer overlaps with the computation of subsequent layers.”
The result: The authors test out their approach with a Llama-3.1 8B (FP16) with an 8,192 token prompt and generating 32 tokens. The resulting system takes 1.47s to do prefill and 0.85s to generate the output, representing a 2.8X speedup over a pure Mac Studio baseline (and a 1.9X speedup over just using the DGX spark).
Why this matters - freedom of computation: Startups like Exo are focused on the political economy of AI, which is currently decided in large part by the computational demands of AI models. These computational demands mean a small number of providers host a tiny set of extremely large, powerful AI systems, and are able to exercise significant control over them. There are some open weight models available which give people a form of AI sovereignty, but running these models is non-trivial because. Prototypes like the Exo project described here help get us to a world where people can build homebrew clusters out of different types of hardware and in doing so regain some amount of control over their AI destiny.
Read more: NVIDIA DGX Spark™ + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 (ExoLabs, blog).
***
Poolside announces a huge data center cluster in Texas:
…When startups are securing the power that comes from a power plant, something strange is afoot…
AI startup Poolside has announced plans to build a 2 Gigawatt AI training campus in West Texas, starting with a 250MW cluster built with CoreWeave containing 40,000 NVIDIA GB300 GPUs.
“Project Horizon is our answer to the infrastructure and power bottlenecks facing the industry,” the startup writes. “We’ve secured a 2 GW behind-the-meter AI campus on 568 acres of development-ready land. The campus will be developed in eight phases of 250 MW each, ensuring scalable, modular growth aligned with advances in compute demand and silicon efficiency.”
Poolside will be building out its datacenter in modular, 2MW increments. “Each system is designed to energize and operate independently, allowing new capacity to come online the moment a modular data hall is placed and connected. This lets training and inference begin immediately, while additional capacity continues to roll out in parallel”.
How big is 2 gigawatts? One of the largest power plants in Texas is the South Texas Project Electric Generating Station which has a capacity of 2.5 Gigawatts across two reactor units.
Why this matters - if a startup you haven’t heard of it doing this, what about everyone else? Poolside is not well known (no offense to anyone from Poolside reading this!), and the fact it is proactively going and securing 2Gw of power is a sign of both how bullish it is about the future of AI, and also a symptom of just how large the overall infrastructure build out is. I’d wager that this year across the frontier labs, clouds, and startups like this we’re seeing gigawatts of capacity getting built out with tens of gigawatts of power being secured. This is a tremendous amount of power!
Read more: Announcing Project Horizon: Why we’re building a 2 gigawatt AI campus in Texas (Poolside).
***
Apple Vision Pro + Unitree hardware = a new dataset for training robot home assistants:
…3 million frames of data…
Researchers with the University of Southern California and Toyota Research Institute have developed and released Humanoid Everyday, “a large-scale and diverse humanoid manipulation dataset”. The dataset was developed by collecting data from two different UniTree humanoid robots piloted by human operators wearing Apple Vision Pro headsets.
What it contains: The dataset consists of 10.3k trajectories containing 3 million frames of data across 260 tasks across 7 broad categories of activity. The categories are basic manipulation, deformable manipulation, tool use, articulated manipulation, high-precision manipulation, human-robot interaction, and loco-manipulation.
Example tasks: The kinds of things being done include picking up and placing objects, cleaning and organizing homes, folding and unfolding clothes, handing items to humans, and cleaning and wiping surfaces.
The data is multi-modal, containing RGB views, LiDAR, depth estimation, tactile readings from the hands, IMU data from the robot, joint states, and human actions.
Why this matters - fuel for the robot revolution: You may have noticed that many companies now, ranging from Tesla and Boston Dynamics to Unitree, are building humanoid robots. But you might also notice these robots are yet to be able to do too much in the way of economically useful work beyond (impressive compared to where we were ten years ago!) locomotion. Datasets like this will help.
Read more: Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation (arXiv).
Get the dataset here: Humanoid Everyday (Github).
***
Tech Tales:
Generative Snowfall
[USA, 2027]
MacroVizier was forced to discontinue its game, Snowfall, after an outcry from customers saying they felt the game led to unhealthy attachments between people and the characters in its games, causing people to damage their own property.
Snowfall was a strategy game where people ran a community of humans that were attempting to govern a village during a extreme ‘global cooling’ event. As time went on in the game, the world got colder. Crops failed. Heating became an expensive necessity.
The game was filled with simulated people, each of which linked back to a very powerful generative model which sat within the hardware on which the game ran. This both made the people in the game much more adaptable to their hardship and also much more emotionally resonant to the people that played with them.
“My wife, she came back from the cold with hands that could not hold anything. I have been feeding her with a spoon,” said one spouse of their partner.
“They say that all of those who walk in the light are blessed, but I cannot help but feel I am being punished for some infraction I cannot see,” wrote another. “It is so cold, colder than I have ever known. I worry about the children.”
“Snow. I know the eskimos have 99 words for it. But I think I have 99 curses for it,” said someone else.
The snow fell and fell and fell. Getting a high score in the game was achieved by keeping morale up for as long as possible. You held parades. You melted snow with a menagerie of heaters and fires. You funded advertising campaigns that the snow would stop.
The structure of the game was “countdown to frozen”. Your high score was determined by how much you protected people till the world cooled below a level that could sustain human life.
Because of how the game worked, the characters would generally trend towards pessimism as time went on. After all, how would you react if the sun went out and everything became difficult for you and no one had answers?
Of course, people developed attachments to their characters. After all, you could speak to them, and they were rendered in exquisite detail and, despite their gaunt faces and illnesses, some could be quite beautiful.
But all the characters eventually died. The world was forever getting cooler.
What MacroVizier failed to anticipate was the extent to which people would go to find characters that had died. After their first playthrough, people would restart the game then become distressed when they couldn’t find characters they had developed attachments to. All the characters in the game were initialized from a random seed at launch which loaded in a customized and highly individualized prompt.
People started writing to the company - pasting in copies of their conversations with the characters and begging them to bring them back. HOW COULD YOU DO THIS TO HER read one subject line. Another said THIS IS YOUR FINAL WARNING and the letter inside noted that details had already been passed to the FBI, local elected officials, and so on.
Things grew from there. Parents started complaining about their children spending far too much time playing the game. Reddit filled up with discussion threads of people talking about their characters and obsessing over them. And some people grew so distraught when their characters died in the game that they killed themselves in turn. Public pressure mounted. Executives were hauled in front of congress.
Eventually, the MacroVizier board made the decision to shut the game down. The company’s next game, Sunrise, was a game where the luminosity of the sun increased and the win state involved harvesting the energy and using it to eventually get off planet. The games characters were given a much more limited thinking budget so as to reduce the chance of social attachments.
Things that inspired this story: Sycophantic relationships between people and AI systems; generative models and games; Frostpunk.
Thanks for reading!
For your articles, please use the Headings function of substack instead of just bolded text. Headings include the ability for readers to link directly to that point in the article. I'd like to link to your Generative Snowfall article for my students, but since it's not a Heading, I can't do it. Did you write it, or is there an original source? If the former, can I reprint it for my students?
Love the Frostpunk shoutout!