Thanks! I am just trying to generate 'receipts of being honest about my own confusion and thoughts'. I do hope to develop some firmer opinions here and also advocate for them in time.
In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better.
Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work.
I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need.
To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science.
Here’s how we might do so, while keeping safety in mind:
By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact.
This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data.
A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study.
The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics.
If you made it this far and think the idea may have merit, I just applied to your biosecurity red team.
It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.
What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.
I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.
My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow
The "Loudly talking about and perhaps demonstrating specific misuses of AI technology" paragraph immediately called to mind CivAI (https://civai.org/). They're pretty much doing exactly what you describe - creating and presenting demos to influential stakeholders to try to raise awareness of the risks. Their focus seems to be primarily on grounded, current-day risks which portend worse in the future, so hopefully the odds of a crying wolf outcome are low.
I agree, while it's fun to think about scifi scenarios of existential risk, I am personally more motivated by near-to-hand risks like scaled phishing attacks, scams involving speech and video generation, widespread AI-mediated manipulation of public opinion, and the like. There will never be a lack of bad actors trying to scam their fellow humans for personal gain.
What are some signs that would indicate we’re on track for powerful AI by 2026–2027? Would it require automated AI research? Or is a significant increase in total factor productivity sufficient?
- Extremely rapid adoption in certain parts of the economy
- Credibly speeding up core loop of AI R&D
- Robust self-correction that lets you massively extend the time horizon of tasks systems could do
- Weird changes in the AI labs themselves - (this rhymes with 'AI R&D' but I'm wondering if there could be sociological / qualitative changes in how these orgs operate that could also serve as indicators).
One objective measure is economic value generation. Once someone in the AI product side (i.e. not Nvidia) starts making a lot of revenue from it, we'll know that powerful (or at least very useful) AI has arrived.
Could you write this first reflection as a standalone blog, and post it somewhere on the open internet? I'd love to be able to share it with folks who may not be Substack users but who ought to be part of this conversation.
All issues of Import AI are auto-mirrored to jack-clark.net for just this purpose!
With regard to first reflection as a standalone blog, I definitely hear you - Import AI is a constant tradeoff between usability for reader and overhead for me (as Import AI production is basically done on nights, early mornings, and weekends, and tradeoff against the whims of my toddler).
It would suck to implement security and definitely make things more annoying but it could be worse. Imagine working at a BSL-4 lab and you have to wear a hazmat suit all day and work through the gloves
If the short AGI timelines are correct, then governments need to dedicate resources to building their own models (or at least fine-tuning/distilling their own models). Sovereign nations can’t simply rely on Enterprise or government versions of commercially available tools and sold to others. They need to have their own exclusive, frontier capabilities.
On the Google cyber story, I wish Google used its 2.0 Pro (experimental) model, instead of Flash. I don’t think hackers will be using entry level LLMs to do harm; they will pick the best performing ones and Flash isn’t that.
On the Google story - I agree, it'd be really helpful to see the scores over a distribution of different models of different competencies.
With regard to sovereign AI, I think this might be a bit of a red-herring under short timelines - especially if you consider that a combination of things like RAG and the right prompting can get you most of the benefits of fine-tuning for given use cases. (Distilling is different as that lets you bottle up models to fit into smaller compute envelopes so interplays with distribution). I've long been a supporter of stuff like the 'National Research Cloud' ideas that various govs (e.g, US, UK, Canada) have discussed, but sadly haven't seen much come out of it. Probably the closest example we have to sovereign AI is France funding the training of BLOOM (early GPT-3 replication) on a French supercomputer, and I don't think results there were especially encouraging.
I still haven’t heard a remotely sufficient answer to the question of why, given estimates of catastrophe or extinction in the range of even 5 or 10%, these AI labs feel they have a right to plow ahead and basically just hope for the best? Especially since even if they “succeed” there’s a high chance what success looks like is mass unemployment and disempowerment.
Of course I know one reason - the risk/reward in all of this is a lot higher if you make 100 million along the way working at these labs, but still, the level of disregard being shown for your fellow man here is astonishing.
Getting everyone to take risks seriously, when there is no historical record for guidance, is next to impossible. With cyberattacks for instance you can point to historical data and make objective claims about risk, and any reasonable person will have to agree about those claims.
In domains like climate change, or AI safety, there is no historical record of black swan events. So to take a risk claim seriously, you have to believe someone's *story*, not their data.
Your "5 or 10% risk of extinction" is in truth a widely distributed set of personal beliefs. And there are clearly a lot of people who think that number is 0%, and are acting accordingly. It's hard to objectively refute them.
It’s also hard to refute the fact that these x-risk claims were made by the very same people who are now making billions and coincidentally now all share a different set of beliefs: that the true risk is any kind of regulation on their companies.
I’m happy you’ve chosen to share your beliefs more freely. To me, they read as coming from a place of conviction and higher values.
Thanks! I am just trying to generate 'receipts of being honest about my own confusion and thoughts'. I do hope to develop some firmer opinions here and also advocate for them in time.
In a similar vein, I’ve been grappling with CBRN risks. Our evals miss the mark and in turn understate capabilities and risk. Here’s what’s wrong -- and what may work better.
Most evals use multiple choice questions alone. These methods are abstracted away from reality. The questions are made by domain experts but assess only whether the LLM produces an answer that overlaps with their model of reality. Unless they’ve tested every wrong answer empirically, not conceptually but experimentally – they will undershoot. Our scientific models are still a work in progress. The LLMs answer can violate our understanding –- and still work.
I’ve witnessed firsthand how LLMs can propose novel experimental solutions that even senior scientists initially dismiss as improbable. Yet, when tested at the bench, they frequently deliver surprisingly effective outcomes. Human experts naturally tend to reject ideas that deviate from established norms. Experimental evidence -- and experimental evidence alone -- is the read-out we need.
To properly evaluate these models, to assess their risk and reap their reward, we must bring evaluations from mental abstraction to experimental science.
Here’s how we might do so, while keeping safety in mind:
By employing fully automated laboratory pipelines capable of plasmid synthesis, restriction digestion, and standardized transfection assays—using harmless, quantifiable reporter genes like eGFP or FLUC—we can directly measure how effectively AI guidance improves real biological outcomes. Metrics such as protein expression levels, cellular uptake efficiency, and experimental reproducibility offer concrete, objective evidence of the AI’s practical impact.
This technique is co-opted from drug discovery, where it’s used to evaluate how small changes in drug-design impact the results. mRNA is a particularly attractive testing ground for automated risk evals, as we can represent nucleotides in plain text. A huge corpus of knowledge regarding sequences is within the training data.
A motivated Anthropic team could partner with an experienced contract research organization specializing in molecular biology and gene synthesis to implement this. The partnership would provide access to automated instruments, reagents, and technical staff needed to execute the study.
The evidence we’d produce from this would compelling for policy makers – while also incidentally inventing better methods for mRNA therapeutics.
If you made it this far and think the idea may have merit, I just applied to your biosecurity red team.
:-)
It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.
What I find special about the TNGS is the Darwin series of automata created at the Neurosciences Institute by Dr. Edelman and his colleagues in the 1990's and 2000's. These machines perform in the real world, not in a restricted simulated world, and display convincing physical behavior indicative of higher psychological functions necessary for consciousness, such as perceptual categorization, memory, and learning. They are based on realistic models of the parts of the biological brain that the theory claims subserve these functions. The extended TNGS allows for the emergence of consciousness based only on further evolutionary development of the brain areas responsible for these functions, in a parsimonious way. No other research I've encountered is anywhere near as convincing.
I post because on almost every video and article about the brain and consciousness that I encounter, the attitude seems to be that we still know next to nothing about how the brain and consciousness work; that there's lots of data but no unifying theory. I believe the extended TNGS is that theory. My motivation is to keep that theory in front of the public. And obviously, I consider it the route to a truly conscious machine, primary and higher-order.
My advice to people who want to create a conscious machine is to seriously ground themselves in the extended TNGS and the Darwin automata first, and proceed from there, by applying to Jeff Krichmar's lab at UC Irvine, possibly. Dr. Edelman's roadmap to a conscious machine is at https://arxiv.org/abs/2105.10461, and here is a video of Jeff Krichmar talking about some of the Darwin automata, https://www.youtube.com/watch?v=J7Uh9phc1Ow
The "Loudly talking about and perhaps demonstrating specific misuses of AI technology" paragraph immediately called to mind CivAI (https://civai.org/). They're pretty much doing exactly what you describe - creating and presenting demos to influential stakeholders to try to raise awareness of the risks. Their focus seems to be primarily on grounded, current-day risks which portend worse in the future, so hopefully the odds of a crying wolf outcome are low.
I agree, while it's fun to think about scifi scenarios of existential risk, I am personally more motivated by near-to-hand risks like scaled phishing attacks, scams involving speech and video generation, widespread AI-mediated manipulation of public opinion, and the like. There will never be a lack of bad actors trying to scam their fellow humans for personal gain.
What are some signs that would indicate we’re on track for powerful AI by 2026–2027? Would it require automated AI research? Or is a significant increase in total factor productivity sufficient?
A few ideas here:
- Extremely rapid adoption in certain parts of the economy
- Credibly speeding up core loop of AI R&D
- Robust self-correction that lets you massively extend the time horizon of tasks systems could do
- Weird changes in the AI labs themselves - (this rhymes with 'AI R&D' but I'm wondering if there could be sociological / qualitative changes in how these orgs operate that could also serve as indicators).
One objective measure is economic value generation. Once someone in the AI product side (i.e. not Nvidia) starts making a lot of revenue from it, we'll know that powerful (or at least very useful) AI has arrived.
Could you write this first reflection as a standalone blog, and post it somewhere on the open internet? I'd love to be able to share it with folks who may not be Substack users but who ought to be part of this conversation.
This post isn’t paywalled, anyone can read it at https://importai.substack.com/p/import-ai-405-what-if-the-timelines
There’s also a copy on Clark’s website: https://jack-clark.net/2025/03/24/import-ai-405-what-if-the-timelines-are-correct/
All issues of Import AI are auto-mirrored to jack-clark.net for just this purpose!
With regard to first reflection as a standalone blog, I definitely hear you - Import AI is a constant tradeoff between usability for reader and overhead for me (as Import AI production is basically done on nights, early mornings, and weekends, and tradeoff against the whims of my toddler).
It would suck to implement security and definitely make things more annoying but it could be worse. Imagine working at a BSL-4 lab and you have to wear a hazmat suit all day and work through the gloves
If the short AGI timelines are correct, then governments need to dedicate resources to building their own models (or at least fine-tuning/distilling their own models). Sovereign nations can’t simply rely on Enterprise or government versions of commercially available tools and sold to others. They need to have their own exclusive, frontier capabilities.
On the Google cyber story, I wish Google used its 2.0 Pro (experimental) model, instead of Flash. I don’t think hackers will be using entry level LLMs to do harm; they will pick the best performing ones and Flash isn’t that.
On the Google story - I agree, it'd be really helpful to see the scores over a distribution of different models of different competencies.
With regard to sovereign AI, I think this might be a bit of a red-herring under short timelines - especially if you consider that a combination of things like RAG and the right prompting can get you most of the benefits of fine-tuning for given use cases. (Distilling is different as that lets you bottle up models to fit into smaller compute envelopes so interplays with distribution). I've long been a supporter of stuff like the 'National Research Cloud' ideas that various govs (e.g, US, UK, Canada) have discussed, but sadly haven't seen much come out of it. Probably the closest example we have to sovereign AI is France funding the training of BLOOM (early GPT-3 replication) on a French supercomputer, and I don't think results there were especially encouraging.
I still haven’t heard a remotely sufficient answer to the question of why, given estimates of catastrophe or extinction in the range of even 5 or 10%, these AI labs feel they have a right to plow ahead and basically just hope for the best? Especially since even if they “succeed” there’s a high chance what success looks like is mass unemployment and disempowerment.
Of course I know one reason - the risk/reward in all of this is a lot higher if you make 100 million along the way working at these labs, but still, the level of disregard being shown for your fellow man here is astonishing.
Getting everyone to take risks seriously, when there is no historical record for guidance, is next to impossible. With cyberattacks for instance you can point to historical data and make objective claims about risk, and any reasonable person will have to agree about those claims.
In domains like climate change, or AI safety, there is no historical record of black swan events. So to take a risk claim seriously, you have to believe someone's *story*, not their data.
Your "5 or 10% risk of extinction" is in truth a widely distributed set of personal beliefs. And there are clearly a lot of people who think that number is 0%, and are acting accordingly. It's hard to objectively refute them.
It’s also hard to refute the fact that these x-risk claims were made by the very same people who are now making billions and coincidentally now all share a different set of beliefs: that the true risk is any kind of regulation on their companies.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it." -- Upton Sinclair
As soon as somebody has a financial stake in the outcome, it should recuse them from any discussion about risk or societal impact.