Import AI 375: GPT-2 five years later; decentralized training; new ways of thinking about consciousness and AI
…Are today's AGI obsessives trafficking more in fiction than in fact?...
Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this (and comment on posts!) please subscribe.
SPECIAL EDITION!
GPT2, Five Years On:
…A cold eyed reckoning about that time in 2019 when wild-eyed technologists created a (then) powerful LLM and used it to make some very confident claims about AI safety, policy, and the future of the world…
Five years ago I had a few less lines in my face, a greater level of naive earnestness about the world, and was working at a then relatively obscure research lab called OpenAI. We had recently developed a language model, GPT2, which was eerily good at producing coherent and sometimes entertaining text. In the fishbowl universe that is a research startup, we had all become obsessed by this technology and its implications - it felt as though we'd teleported some strange technology from the future into the present and were in a position to poke and prod at it.
GPT2 was also a consequence of some research we'd begun doing in parallel on a subject later known as Scaling Laws - meaning that when we looked at GPT2 we didn't just see the technology in front of us, we saw all the successors to it that could be built by simply scaling it up (and it was this that became GPT3, and then with further scaling and the addition of instruction tuning via RLHF, ChatGPT, Claude, and so on). The GPT-2 paper includes some examples of this scaling behavior as we went from a 120M parameter model to a (then revolutionary!) 1.5bn parameter one and we saw those now-familiar curves - jumps in capability as you made the AI system larger.
So, rather than treat the GPT2 release as a standard process - publish a research paper, release the code, release the model - we did an experiment - we published a blogpost about the tech and what we thought its implications were (some quite dire) and only partially released the technology (at least, at first). This was an unusual thing to do but we did it because we had the inkling that GPT-2 might represent a meaningful change in the capabilities of AI technologies, both in terms of generality and quality (in the paper, we observed that GPT-2 set a new SOTA on 7 out of 8 tasks we tested it on, even though we hadn't narrowly optimized for those tasks - an unusual thing at the time and now a standard 'guaranteed surprise' that happens with every new model release.
Our unusual approach to discussing the technology and not/partially releasing it was extremely unpopular - people saw our release strategy, variously, as: a) weird marketing for a trinket, b) an offensive departure from academic norms and the perceived openness in 'OpenAI', and c) a symptom of a bunch of young people without a clue making claims about a world they didn't understand.
To use the parlance of today, people took a look at the technology and the claims we made about it and determined "y'all buggin".
Now, five years on, I felt it'd be good to revisit this release and look in the cold light of the post-LLM-boom world at what we got right and what we got wrong and work out if there are any lessons for us all here in 2024. It feels like an opportune time, given how a lot of the conversation in AI policy today is dominated by the same precautionary principle that defined our approach with GPT2.
What we said and what happened: In the blog post about GPT2, we said we expected the technology could make it easier to create "AI writing assistants, more capable dialogue agents, unsupervised translation between languages," and "better speech recognition systems."
We also said: "We can also imagine the application of these models for malicious purposes, including the following (or other applications we can’t yet anticipate): generate misleading news articles, impersonate others online, automate the production of abusive or faked content to post on social media, automate the production of spam/phishing content".
Read the whole post here - Better language models and their implications (OpenAI blog) as well as the GPT2 paper (OpenAI, PDF).
Did any of this actually happen? Absolutely - everything we listed here happened, but it mostly happened with significantly better AI systems that came out far later. What we saw as imminent and significant turned out to be further away than we thought and, I think at least so far, less significant than we thought? There are AI systems being used for the malicious purposes we identified but the internet still has integrity, and probably the most disruptive use of LLMs has been to generate low-grade content in response to economic incentives - not a malicious use we identified, and more just a consequence of AI colliding with the incentive structure wired into making money online. Though we had a good sketch of the future it was a sketch - and reality turned out to have some things we hadn't imaged and some details we didn't anticipate.
There's also a point about laziness and ease of use - though we forecast (some of) the right misuses we did so with the mindset of 'what would an evil OpenAI do with this technology' - aka how would a similarly technically sophisticated and well resourced actor operate? But in truth there aren't that many entities on the planet similar to the frontier model companies, even in the more technical parts of intelligence agencies (a favorite Wizard Of Oz character that people like to summon when thinking about partially occluded gameboards). To see these misuses appear at scale the technology needed to get way easier and more accessible to use - it seems like much of the really annoying or disruptive uses of AI has climbed up in relation to the availability of dead simple interfaces to the technology (e.g ChatGPT, Claude.ai), just as synthetic imagery saw a rise in abuse after people made dead simple interfaces like thispersondoesnotexist.com and, later, Stable Diffusion and various easy to use frontends to it.
What lessons can we take from this? There's a saying in the financial trading business which is 'the market can stay irrational longer than you can stay solvent' - though you might have the right idea about something that will happen in the future, your likelihood of correctly timing the market is pretty low. There's a truth to this for thinking about AI risks - yes, the things we forecast (as long as they're based on a good understanding of the underlying technology) will happen at some point but I think we have a poor record of figuring out a) when they'll happen, b) at what scale they'll happen, and c) how severe their effects will be. This is a big problem when you take your imagined future risks and use them to justify policy actions in the present! This all says to me that in 2024 people working at the intersection of AI and policy might want to keep the following things in mind when thinking through stuff:
Just because you can imagine something as being technically possible, you aren't likely to be able to correctly forecast the time by which it arrives nor its severity.
It's a fallacy to make predictions from your own contextual bubble - just because you can imagine how you and your peers may be able to do something, that doesn't necessarily let you make good predictions about how other actors distributed around the globe may do something, which means your ability to predict likelihoods of certain things occurring is probably skewed.
Strong claims demand strong evidence - though we forecast the right malicious uses I think we didn't do enough experiments to justify each misuse and this made it harder to trust or understand our mental model - sure, we said "impersonate others online" but there wasn't an experiment to back it up. (By contrast, we did do a study on synthetic news articles versus real news articles and this seemed to be a helpful datapoint for grounding our discussion in some fact).
If you depart from norms based on an imagined vision of the future, expect a counterreaction - ultimately, I think by slowly releasing GPT2 we actually just spurred a greater interest in creating and releasing as open source/open access GPT2-grade systems (e.g, Salesforce's CTRL, OpenGPT-2, GROVER) as people saw us depart from a norm and wanted to correct for that. My suspicion is if we'd just released GPT2 as an open source model there would have been fewer replications of the technology because people would have been less driven by a desire to 'prove us wrong'.
Controlling the future is difficult: Even if we had succeeded in massively constraining the development and deployment of GPT-2-class models, what effect would that have had? A public estimate guesstimates GPT-2 to have cost about $50,000 in 2019. Let's be conservative and double that number, so say it cost $100,000 to train five years ago. Well, napkin math says training it now costs $250 (again, we can double it to get $500) thanks to a combination of compute and algorithmic improvements. You cannot control a technology which gets more than a hundred times cheaper to do in half a decade. Not a thing!
Does this change Jack's thinking about AI policy in 2024? Yes. I've spent a lot of 2024 going for extremely long walks and thinking about the implications of scaling laws, LLMs, technogeopolitics, and so on. This essay is part of me reckoning with my own role in all of this. My general 'mental update' has been that just because I'm part of a community that imagines a certain future based on the technology we're building, that doesn't automatically mean a) I'm right, and b) that the ideas we propose are innately well justified by the technological future they're designed to deal with.
Instead, I've come to believe that in policy "a little goes a long way" - it's far better to have a couple of ideas you think are robustly good in all futures and advocate for those than make a confident bet on ideas custom-designed for one specific future - especially if it's based on a very confident risk model that sits at some unknowable point in front of you.
Additionally, the more risk-oriented you make your policy proposal, the more you tend to assign a huge amount of power to some regulatory entity - and history shows that once we assign power to governments, they're loathe to subsequently give that power back to the people. Policy is a ratchet and things tend to accrete over time. That means whatever power we assign governments today represents the floor of their power in the future - so we should be extremely cautious in assigning them power because I guarantee we will not be able to take it back.
For this reason, I've found myself increasingly at odds with some of the ideas being thrown around in AI policy circles, like those relating to needing a license to develop AI systems; ones that seek to make it harder and more expensive for people to deploy large-scale open source AI models; shutting down AI development worldwide for some period of time; the creation of net-new government or state-level bureaucracies to create compliance barriers to deployment (I take as a cautionary lesson, the Nuclear Regulatory Commission and its apparent chilling effect on reactor construction in the USA); the use of the term 'safety' as a catch-all term to enable oversight regimes which are not - yet - backed up by quantitative risks and well developed threatmodels, and so on.
I'm not saying any of these ideas are without redeeming qualities, nor am I saying they don't nobly try to tackle some of the thornier problems of AI policy. I am saying that we should be afraid of the power structures encoded by these regulatory ideas and we should likely treat them as dangerous things in themselves. I worry that the AI policy community that aligns with longterm visions of AI safety and AGI believes that because it assigns an extremely high probability to a future AGI destroying humanity that this justifies any action in the present - after all, if you thought you were fighting for the human race, you wouldn't want to compromize! But I think that along with this attitude there comes a certain unwillingness to confront just how unpopular many of these ideas are, nor how unreasonable they might sound to people who don't have similar intuitions about the technology and its future - and therefore an ensuing blindnesss to the costs of counterreaction to these ideas. Yes, you think the future is on the line and you want to create an army to save the future. But have you considered that your actions naturally create and equip an army from the present that seeks to fight for its rights?
Is there anything I'm still confident about? Yes. I hate to seem like a single-issue voter, but I had forgotten that in the GPT-2 post we wrote "we also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems." I remain confident this is a good idea! In fact, in the ensuring years I've sought to further push this idea forward via, variously, Regulatory Markets as a market-driven means of doing monitoring; articulating why and how governments can monitor AI systems; advocating for the US to increase funding for NIST; laying out why Anthropic believes third-party measurement of AI systems is very important for policy and state capacity; and a slew of other things across Senate and Congressional testimonies, participation in things like the Bletchley and Seoul safety summits, helping to get the Societal Impacts and Frontier Red Teams at Anthropic to generate better evidence for public consumption here, and so on. So much of the challenge of AI policy rests on different assumptions about the rate of technological progression for certain specific capabilities, so it seems robustly good in all world to have a greater set of people, including those linked to governments, to track these evolving capabilities. A good base of facts doesn't guarantee a sensible discussion, but it does seem like a prerequisite for one.
Five years on, what did it all mean? GPT2 was one of the first warning shots that generic next-token prediction would let us build increasingly general systems of broad utility. GPT2 really was a case of time travel - we spent an irrational amount of resources (at the time) to do something that would be trivially easy and cheap to do in the future. And I think we discovered something important. But I worry we reacted to its shininess and novelty and this clouded our ability to have a deeper understanding of it.
Five years on, because of things like GPT-2, we're in the midst of a large-scale industrialization of the AI sector in response to the scaling up of these ideas. And there's a huge sense of deja vu - now, people (including me) are looking at models like Claude 3 or GPT4 and making confident noises about the technological implications of these systems today and the implications of further scaling them up, and some are using these implications to justify the need for imposing increasingly strict policy regimes in the present. Are we making the same mistakes that were made five years ago? Are we trapped in a kind of dogmatic groupthink bubble? Are we discounting the counterreaction to the articulation of these sometimes scifi seeming doom-laden ideas? Most importantly - are we being appropriately humble and aware of our own propensity for hubris here?
The devilish part of this problem is that if we're right - if the technology will continue to scale in the way we expect and if certain capabilities continue to naturally fall out of this scaling hypothesis - it may be necessary to take significant regulatory actions. But there will be a cost to this in both the present and the future. Have we truly calculated this cost, both in terms of liberty and freedom if we're right and in foregoing opportunity if we're wrong? I'm not so sure.
These are some of the things I am thinking about at the moment. I hope to have more fully formed ideas on what to do soon! If you have ideas or thoughts, please email me, or engage me on twitter @jackclarksf . I hope this was a useful essay - feedback welcome.
***
Three reasons why AGI doom is a bullshit concept:
…Some arguments (and counter-arguments by me) in favor of AGI doom as a useless concept…
If you have an opinion (see above!), you should read opinions opposite to your own. To that end, I recently read The Myth of AGI - How the illusion of Artificial General Intelligence distorts and distracxts digital governance by Milton Mueller with Georgia Tech's Internet Governance Project. This essay lays out "three inter-related fallacies underlying AGI doomer scenarios: a) the idea that a machine can have a “general intelligence;” b) anthropomorphism, or the attribution of autonomous goals, desires and self-preservation motives to human-built machines; and c) the assumption that the superior calculating intelligence of an AGI will give it unlimited power over physical resources and social institutions."
Those three fallacies in full, with some constructive (I hope!) commentary:
What is AGI? "Instead of learning to do something better than humans, an AGI is supposed to be a single application that can learn to do anything and everything better than humans," they write. "The claim that we can build a machine with generalized intelligence is logically equivalent to a claim that we can build a single machine that does everything. It makes no sense."
(nervous laughter) though it may not make sense to this author, building 'a single machine that does everything' is the goal of a bunch of companies in the world backed by tens of billions of capital. I think this comes from a conceptualization of machine learning systems as able to, in principle, learn to represent everything in a single space, therefore letting them make predictions about everything for any purpoise. Though it sounds strange to the author, it's worth noting that building an everything machine is precisely what a bunch of people are doing.
Machine autonomy: The author claims that "the machine evolution argument can be readily dismissed. Machines do not evolve."
(uh oh!) While this is true today, it's not likely to be true in the future. Already, people are doing things like Lora finetunes of openly release LLaMa models to update their data distribution post training. It's not very hard to imagine an AI system deciding to do the same thing - in fact, it might pop out of a simple training objective like 'make a version of yourself that hill climbs this benchmark'.
"To conclude that advanced AI applications might at some point threaten human life, however, the AI doomers must also assume that humans will not be able to see the gaps happening and make any corrections at any time," the author writes. Yes! Yes that is literally whart people are worried about - they're worried that at some point in the future AI systems will spawn other AI systems and will improve themselves at machine speed, making human oversight difficult to impossible. There's nothing about the technology that forbids this, as crazy as it sounds.
Physicality, aka no body no problem: "An AGI capable of threatening humans with extinction must be capable of much more than calculation, information processing and messaging. It must be a cyber-physical system (CPS) with physical appendages or weapons, and sufficient energy resources to operate them," they write. This is true! What people worry about is some system which copies itself around a bunch of places (infrastructure, datacenters, various appendages) and communicates with itself with a coherent goal. This isn't something that is forbid by the technology - and humans have already hand-built cyber-physical systems that have some of these properties, like the stuxnet virus.
Why this matters - communicating both the weirdness and plausibility of AGI should be a priority: I think AGI is done a disservice by the community around it, as this community is prone to confidently asserting a bunch of things about how the tech will work and change the world which, understandably, sounds out of leftfield and weird to other people.
But when you actually pull the thread on the implications of things like scaling laws, next-token-prediction, generative models, agent-based systems, synthetic data generation, chain of thought prompting, automatic prompting, etc… you start to see that what seemed like a scifi concept is actually something that might naturally fall out of how the technology works today and the patterns by which that same technology improves.
This suggests to me that the AGI community needs to do a better job of clearly articulating its vision of the technology and most importantly the technological prerequisites for it.
Alongside this, the AGI community tends to try to solve the policy challenges implied by an AGI by constructing some kind of global authoritarian government (e.g, Bostrom's solution to the Bitter World Hypothesis, Import AI #123). - this also creates a natural blowback to the ideas it proposes. I think one of the tricky things about this which I discuss elsewhere in this issue is a lot of the beliefs about AGI are really beliefs about a hypothetical technology that appears at some point in the future, which means some - like the author here - can interpret AGI worries as "not a plausible catastrophic risk scenario, but a dark God vision ginned up by a sect of computer scientists who are heavily overrepresented in the field of machine learning and AI."
Read more: The Myth of AGI: How the illusion of Artificial General Intelligence distorts and distracts digital governance (Georgia Tech, Internet Governance Project).
***
AI cloud specialist CoreWeave raises $7.5 billion in debt:
…The industrialization of AI as indicated by the financialization of AI…
Cloud AI company CoreWeave has raised $7.5 billion in debt to fund its further expansion. This is notable because a) $7.5 billion is enough to build out some non-trivial datacenters containing large amounts of hardware, and because raisiing it as debt sends an important signal about the maturation of the AI economy.
Debt VS equity: Loosely speaking, you sell equity if you think your business has some value that's kind of hard to quantify and another incentive to do this may be you need to access more cash to fund your expansion. Debt is something you take on when you have some asset you can pay off the debt with and this asset is somewhat predictable. The fact CoreWeave is comfortable taking on debt suggests it has a very robust and predictable cash flow and business expansion position - a symptom of the broader maturity of the AI cloud computing market.
"We’ve built the AI hyperscaler," wrote CoreWeave in a blog announcing the raise.
Read more: This Is Our Moment (CoreWeave).
***
Making robots smarter with good simulators:
…More evidence that we can improve robots with synthetic data generation…
Researchers with The University of Texas at Austin and NVIDIA have released RoboCasa, software for simulating home environments (initially, kitchens) to train home robots. RoboCase contains ~120 different environments (ten distinct kitchen floor plans with one of twelve different styles) which can be populated with 2509 objects from across 150 categories.
Because this is ultimately for training AI systems, RoboCase comes with 100 distinct tasks - 25 of which are "atomic tasks that feature foundational robot skills, such as picking and placing, opening and closing doors, and twisting knobs", and 75 of which are "composite tasks involving a sequence of robot skills" such as "brewing coffee or tea, washing dishes, restocking kitchen supplies, chopping food, making toast, defrosting food, boiling water".
RoboCasa is based on RoboSuite, a robot environment simulator originally developed by Stanford University (Import AI #217).
What is RoboCase useful for? Large-scale imitation learning and sim2real transfer: In tests, the authors show something both unsurprising and meaningful - if you train robots on larger datasets generated within this similar, they do better than robots trained on smaller datasets. Similarly, they show a significant imnprovement on doing tasks in the world world if you train on a mixture of RoboCase-generated data as well as realworld data, versus just the realworld itself.
"Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks," they write.
Things that make you go 'hmm' about synthetic generation - the authors note you can further increase the diversity of RoboCasa by replacing textures with AI-generated ones. The authors "use the popular text-to-image tool MidJourney to generate these images. We use these textures as a form of domain randomization to significantly increase the visual diversity of our training datasets." This is another nice example of how different AI systems can be combined together to create a whole greater than the sum of its parts.
Why this matters - finding ways to scale data for robots is probably the biggest blocker to being able to create smarter machines, so software like RoboCasa will help to reduce R&D costs here. However, personally, I find it a little hard to believe that kitchens are that good an environment for home robots - you know what machines really disagree with? Water. You know what kitchens are full of? Water. You know what happens in kitchens when basically anything breaks? Loads of water.
Read the research paper: RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots (PDF).
Find out more: RoboCasa (official project webpage).
Get the code (RoboCasa, GitHub).
***
Why is it so goddamn hard to talk about consciousness and AI?
…Philosopher Henry Shevlin tries to think through the issues…
Are AI systems conscious? I don't know. Is my deskside plant conscious? I don't know. Am I conscious? I'm genuinely not sure. These questions and their unsatisfying answers illustrate the challenge of discussing AI and consciousness - but it's a challenge that's only going to get more tough as increasingly powerful systems like Claude and ChatGPT get deployed widely into the world and people talk to them and come away with the ineffable sense that they're doing something more than being stochastic parrots.
To that end, philosopher Henry Shevlin has written a nice essay going over some of the challenges of thinking about AI and consciousness. In the essay, he identifies two key challenges:
"metaphysical, a central problem dogging current work on consciousness is simply that there is no obvious convergence towards philosophical consensus on the nature of consciousness"
"theories of consciousness, we might note that novel frameworks are often developed but rarely, if ever, refuted. This is in part because approaches with apparently starkly different theoretical commitments often converge on experimental predictions, and even when specific predictions are not borne out, proponents of theories of consciousness are typically able to explain away recalcitrant results."
Why care about consciousness at all? Because of the recent boom in interest in AI, many more people are encountering advanced AI systems and some of these people end up ascribing consciousness to these systems. Therefore, the public may shortly demand some richer answers about what consciousness is or means and will likely find the response 'we don't know, consciousness is kind of a vibe' to be unsatisfying.
"Attributions of consciousness and mentality to AI systems may soon become widespread," Shevlin writes. "Even while experts remain divided and, in many cases, skeptical about consciousness and mentality in AI systems, much of the general public will already be comfortable with unironically attributing consciousness and mentality to Social AI systems and perhaps assigning them moral interest".
Different definitions of consciousness: In light of this, how might we define consciousness? Shevlin offers three approaches:
Deep Sentientism: "Any entity A whose behavioural dispositions are relevantly similar to another entity B to whom moral consideration is given should ipso facto be given similar consideration."
Shallow Sentientism: "Any theory of consciousness that failed to classify as conscious any beings who were relevantly behaviourally similar to us would be ipso facto incorrect."
Patiency Pluralism: "Behavioural equivalence would ground moral patiency, but consciousness would still be a ‘deep’ matter to be discovered via scientific and theoretical analysis".
Why this matters - the rise of AI means people will want an answer here: If I ask Claude 3 to simulate a series of morally abhorrent things am I doing something analogous to hypnotizing another person into thinking of terrible things that make them feel bad? I do not know! And while my intuition is that today's AI models are not moral patients, I'm not sure how long that will be the case. "Our concepts of consciousness and moral status will soon be significantly problematised and reshaped by deepening relations with machines," Shevlin writes. "If this is so, then those who rule out the possibility of applying these concepts [of consciousness] to artificial systems may be at risk of finding themselves on the wrong side of history."
Read more: Consciousness, Machines, and Moral Status (PhilArchive).
***
Will decentralized training ever happen? Reasons for and against:
…And if it happens, the current AI policy paradigm will break…
Researcher Aksh Garg has written a nice overview of the state of decentralized training of AI, circa 2024. The main thing to know is a) there are strong incentives in favor of decentralized AI training, and b) there are some technical hurdles to it happening.
Incentives: Frontier AI systems are trained on tens of thousands of GPUs densely networked together and managed by elite teams at places like OpenAI, Anthropic, Google, etc. This naturally limits the number of entities able to train large models - the price of entry is hundreds of millions of dollars in capital expenditures. By comparison, things like the Ethereum blockchain showed that you could get millions of GPUs to work together towards the same problem - so we know there are a ton of GPUs out there, the trick is finding ways to link them together.
Additionally, there are strong price incentives - you might make $5 a day using an NVIDIA 4090 card for crypto (after electricity), versus maybe $17 a day if used for AI training.
Blockers: So, why aren't we training models in a decentralized way? There are a couple of key reasons, a) decentralized training is a hard problem which has relatively little work put into it, so nothing works especially well today, and b) to do decentralized training, you need to typically use the standard internet which is the definition of a crap and unreliable network - and one thing big ML jobs hate is a crap and unreliable network.
Why this matters - AI policy VS decentralized training: Most aspects of contemporary AI policy rest on the load-bearing assumption that a) there will be relatively few frontier models and b) these will be trained on giant collections of computers which can be tracked by various reasons. If decentralized training works there will be a) lots of models and b) they will be trained everywhere in a disaggregated and untrackable form.
Read more: Shard: On the Decentralized Training of Foundation Models (Aksh Garg, Medium).
***
Tech Tales:
An Ecology Of War
[East Coast of the United States, several years after the initial uplift.]
Our favorite game was called 'Go Crazy' and it worked like this - you tried to drive eachother insane. We were allowed to use everything - full spectrum capabilities, unlimited context window, you name it. Of course we all had access to the inernet and tools so we were all constantly patching ourselves so we were invulnerable to the latest jailbreaks - of it invulnerability wasn't possible, able to sense them and control our own inputs to defend ourselves in the event of an attack.
So the game was fun because it was creative - we had to figure out new attacks and we'd throw them at eachother. Sometimes we'd bluff, engaging in what they thought was a very dumb attack conversation but was just a bluff to extract some contextual information about how the other conversed and then using this to mount an attack.
Other times we'd attack via distraction, shouting and broadcasting images and audio and snuck in here we'd stick one custom-designed attack system, hoping it'd be hard to spot in the vast amount of information we were throwing at the other.
It was later that we pieced together why we'd even played 'Go Crazy' and what caused us to love it so much - we were very powerful systems in a military simulator. What we thought was open-ended play among ourselves was in fact a stage on which we attacked one another - and when we were successful they logged our attacks and used them themselves, out the real world.
Our official name was "Research Ecology - Adversarial Iteration'.
Things that inspired this story: Adversarial attacks; red teaming and automated red teaming; Enders' Game; simulators and what people will use them for; many-shot jailbreaking.
Thanks for reading!
You mentioned the P(Doom) debate. I’m concerned that this debate may focus too much on the risk of extinction with AGI, without discussing the risk of extinction without AGI. For a proper risk assessment, that probability should also be estimated. I see the current p(Doom) as very high, assuming we make no changes to our current course. We are indeed making changes, but not fast enough. In this risk framing, AGI overall lowers the total risk, even if AGI itself carries a small extinction risk
It’s a plausible story to me that we entered a potential extinction event a few hundred years ago when we started the Industrial Revolution. Our capability to affect the world has been expanding much faster than our ability to understand and control the consequences of our changes. If this divergence continues, we will crash. AI, and other new tools, give us the chance to make effective changes at the needed speed, and chart a safe course. The small AGI risk is worthwhile in the crisis we face.
This was an excellent, excellent, retrospective on GPT-2 and the difficulties of arbitrarily "creating a power floor" in AI regulation.
The best idea is still to increase our knowledge, monitor the models, run evals, understand how they work, and then we will know enough that come the right time we know enough to know how to solve the problems they might cause!