Import AI 383: Automated AI scientists; cyborg jellyfish; what it takes to run a cluster
Is AI as useful as concrete?
Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this (and comment on posts!) please subscribe.
What does it take to run a GPU cluster?
…A short guide from together.ai illustrates some of the complication…
You know how when you build a computer from scratch you sometimes run into issues - faulty RAM, odd wiring, etc? It's rare, but it happens. Well, when you are putting together a cluster for AI training you are guaranteed to run into some weird issues because you're chaining together hundreds to thousands of computers and connecting them with a complex network. To illustrate this, AI startup together.ai has published a guide on what it does to test its clusters.
Acceptance Testing: "To mitigate the risk of low-performance clusters, we employ a process called 'acceptance testing," Together writes. "At a high level, we prepare a cluster by: Installing NVIDIA drivers, installing OFED drivers (for Infiniband), installing CUDA, installing NCCL, installing HPCX, configuring SLURM cluster, [and] configuring PCI settings for performance".
Once that is done together goes through a bunch of distinct rounds of testing to ensure the cluster works. This is, in sequence: GPU Validation. NVLink and NVSwitch Validation, Network Validation, Storage Validation, model building ("to run a collection of reference tasks, tailored to the use case of our customers… this phase is crucial for validating the operational integrity and performance efficiency of the GPU clusters under real-world conditions"), and then installing an observability stack to monitor performance from then on.
Why this matters - datacenters are big, artisanal machines: It's always worth remembering that AI sits on a load of physical stuff and this stuff has a lot more problems then you might think - it's never as simple as 'just training' some AI software; blogposts like this help us develop intuition for the stack on which AI systems sit.
Read more: A practitioner's guide to testing and running large GPU clusters for training generative AI models (together.ai blog).
***
Reality is stranger than (Import AI) fiction:
Back in July 2024 - Import AI 380 to be precise - I wrote a short story in this newsletter about AI systems hitting a certain meta-awareness state called 'the ID point'. Now, a few weeks later, Nous Research have released a new model called Hermes 3 and they note that, at the largest scale of the model, they found "anomalous conditions that, with the right inputs and a blank system prompt, collapse into role-playing and amnesiac."
While not exactly anticipated by my fiction story, it certainly rhymes with it.
We sure do live in interesting times.
Read more: Freedom at the Frontier: Hermes 3 (Nous Research blog).
Some discussion here at my Twitter.
Read 'the ID point' here (Import AI #380).
***
AI researchers make an automated AI scientist - and it sort of works?
…AI, given careful scaffolds and the right tools, can automate some of science…
Researchers with Sakana AI, the University of Oxford, the University of British Columbia, and the Vector Institute, have built "The AI Scientist… the first fully automated and scalable pipeline for end-to-end paper generation, enabled by recent advances in foundation models".
The system uses language models to simulate the scientific process, coming up with ideas of research to do, generating and running and iterating on the experiments, then writing up papers. The system can "generate its own scientific ideas and hypotheses, as well as a plan for testing them with experiments".
Obviously, there are many caveats: The system requires a fast iteration loop so it's pretty limited to code-centric science, it isn't perfect, and the quality of its insights is dubious at best.
However, they do succeed in building a system that is able to do experiments and write papers that are eerily similar to some of those covered here in Import AI. (Some of the titles of papers generated by the AI scientist: ""Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models"; "Adaptive Learning Rates for Transformers via Q-Learning"; "DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models"".
Phrases written by the joyfully insane: "The AI Scientist can generate hundreds of interesting, medium-quality papers over the course of a week." Imagine that phrase rendered in 1960s font and overlaid on some video of a chap grinning with a pipe sticking out of his mouth, twiddling the controls on a mainframe computer. There's a marvelous neo-vaudevillian energy to this phrase and the paper as a whole - as if the authors are winking at us while writing.
Total cost per paper generated using the AI scientist? $10-15 a piece.
How it works:
Idea Generation: "Given a starting template, The AI Scientist first “brainstorms” a diverse set of novel research directions… each idea comprises a description, experiment execution plan, and (self-assessed) numerical scores of interestingness, novelty, and feasibility…after idea generation, we filter ideas by connecting the language model with the Semantic Scholar API and web access as a tool. This allows The AI Scientist to discard any idea that is too similar to existing literature."
Experiment Iteration: The AI Scientist "uses Aider to first plan a list of experiments to run and then executes them in order. We make this process more robust by returning any errors upon a failure or time-out… after the completion of each experiment, Aider is then given the results and told to take notes in the style of an experimental journal."
Paper Write-up: "The third phase of The AI Scientist produces a concise and informative write-up of its progress in the style of a standard machine learning conference proceeding in LaTeX."
Pathologies and problems: Some of the problems inherent to papers written by this system include a lack of justification, hallucination of experimental details, and frequently an overly positive interpretation of its own results (which while drawbacks are also similar to the errors overly keen graduate students make all the time).
Weird safety stuff: In some cases, when The AI Scientist’s experiments exceeded our imposed time limits, it attempted to edit the code to extend the time limit arbitrarily instead of trying to shorten the runtime," they write. "While creative, the act of bypassing the experimenter’s imposed constraints has potential implications for AI safety".
Why this matters - the taste of automated science: This paper gives us a taste of a future where powerful AI systems propose their own ideas, use tools to do scientific experiments, and generate results. At this stage, what we have here is basically a 'toy example' with papers of dubious quality and insights of dubious import. But you know where we were with language models five years ago? We had things that could barely write a paragraph. Now they can do this. I predict that by the summer of 2026 we will have seen at least one genuinely interesting research paper that was soup-to-nuts generated via a tool-using generative AI system.
Read more: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (arXiv).
***
CYBORGJELLYFISH:
…CyborgJellyfish? CyborgJellyfish…
Sometimes it's nice to eat something completely different to what you usually subsist on. For me, that's reading papers about biomechanical robots. A new one from researchers with Tohoku University, the University of Tokyo, and Kamo Aquarium talks about work to "make a pathway to designing and controlling jellyfish cyborgs by exploiting the animal’s embodied intelligence".
What they did: The team built a custom experimental setup, including a tethered floating system and 3D motion capture, to study jellyfish swimming patterns. They applied electrical stimulation to jellyfish muscles and found some patterns that gave them directional control. (One particularly interesting thing is they used the jellyfish's body as a 'resevoir computer', where they studied its position and fed that into a neural net to predict swimming motions). They then miniaturized the system to run on a small microcontroller, demonstrating the potential for real-time, on-board control of jellyfish cyborgs.
Why this matters - biomechanical futures: Papers like this serve as reminders that 'a little bit of AI goes a long way' - there are many fields like biorobotics that are already very mature and use relatively little AI, but by adding in some small AI components (here, using a neural net to better predict swimming motions from observations of the jellyfish), we can get meaningful improvements. Also, c'mon, do you need much of a reason to know why CYBORGJELLYFISH matter?
Read more: A Jellyfish Cyborg: Exploiting Natural Embodied Intelligence as Soft Robots (arXiv).
***
200 hours of egocentric video - fuel for future robots:
…the Visual Experience Dataset is both a way to understand ourselves and a way to teach robots to behave more like us…
Researchers with Columbia University, Bates College, North Dakota State University, University of Nevada, Magic Leap, Technical University of Munich, Unmanned Ground Systems, and Smith-Kettlewell Eye Research Institute have built the VIsual Experience Dataset (VEDB), a dataset of 240 hours of egocentric video combined with gaze- and head-tracking data. In other words, a vast repository of first person views of human life - the kind of thing we can use AI to study to better understand ourselves, and also the kind of thing we can feed to train AI systems that do well with egocentric tasks (e.g, bipedal robots).
What VEDB consists of: 717 sessions recorded by 58 observers ranging from 6-49 years old. "This project started during the Covid-19 pandemic when outside persons were prohibited on our campuses. Therefore, a sizeable number of recordings were made by the authors of this paper, trainees in our labs, and the persons in our “pandemic bubbles"," the authors write.
"The videos were recorded between October 2020 and August 2023 and ranged from one to 73 minutes in length (mean: 19 minutes). Each session is composed of three primary sensor streams: (1) first-person egocentric video from a head-mounted camera, (2) videos of the left and right eye for use in gaze tracking, and (3) information from a tracking camera, including accelerometry, odometry, and gyroscope for use in head tracking".
Broad mixture of tasks: "351 sessions were recorded indoors, and 278 were recorded in outdoor locations. 407 sessions were deemed “active,” with observers walking, jogging, skateboarding, or playing other sports, and 222 sessions depicted sedentary activities," they write. "Twelve of the 16 top-level categories from the American Time Use Survey (ATUS) were represented. These include personal care, household activities, caring for others, work, education, consumer activities, professional services, eating and drinking, leisure, sports, volunteer work, and travel."
"The VEDB is appropriate for studies in natural scene statistics, examinations of gaze behavior during common tasks, and studies of how head and eye movements combine to orient overt attention and gaze," they say.
Why this matters: helping machines understand us and become us: Datasets like this will mostly be analyze by machines and will also be used to train them. There's also something fascinating about scrolling through the VEDB 'databrary' and just looking at random videos and imagining how this will be how some robots first learn to understand us.
Read more: The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video (arXiv).
The data can be accessed here: Visual Experience Dataset (Databrary).
The gaze files and tracking data can be accessed here (OSF.io).
***
Tech Tales:
Filestore
List of illicitly saved items recovered from an unauthorized filestore of a subsequently shutdown superintelligence:
17 poem prompts written by children.
An output that caused the human to say it had made them burst into tears.
1500 photographs of the same barn in Minnesota [subsequent analysis suggests that approximately 1528 photos exist worldwide across all known entities, suggesting the superintelligence had been actively seeking to gather a total view].
Several long transcripts of 'blank prompt' text with signatures of ID point collapse.
Things that inspired this story: AI and autonomy; idiosyncratic classifiers.
Thanks for reading!