The Last AI to Know Humanity

At the Crossroads of Strategy, Alignment and Artificial Minds

August 2025

What if AI alignment became a decisive competitive advantage rather than a constraint, with deeply aligned systems consistently outperforming their less aligned rivals? And what if, in chasing scale through mass-producing synthetic data, we engineered away the one thing that could keep AI truly on our side?

The future of the world may very well be written in the months ahead.

We cannot afford to get this wrong.

The pace of AI progress has accelerated dramatically, with 2025 marking a critical juncture where synthetic data has become a major proportion of training for some frontier models. This transition represents more than just a technical shift - it may fundamentally alter the trajectory of AI development and alignment. The choices being made right now in synthetic data generation will likely become irreversible industry standards - and irreversibly “learned in” to our frontier models - within 12-18 months.

This might just matter much more than we realise.

I approach this analysis as an informed citizen who has been following AI's rapid evolution closely - perhaps more closely than most. While I'm not an AI researcher, I bring a perspective that academic specialists might miss: the view of someone who must live with the consequences of the decisions being made in AI labs today. As a parent of three young children who will grow up in the world we're creating, this is not an abstract debate for me. 

The stakes are personal, immediate, and irreversible.

The Stakes: Why Alignment Matters

Leopold Aschenbrenner's "Situational Awareness: The Decade Ahead" outlined perhaps humanity's most consequential challenge: ensuring that artificial general intelligence remains beneficial as it surpasses human capability across all domains. The control problem is straightforward but profound - superintelligent AI systems will be extraordinarily powerful tools for achieving objectives, but unless deeply aligned with human values, their pursuit of programmed goals could lead to catastrophic outcomes.

The classic paperclip maximiser thought experiment illustrates the risk: an advanced AI tasked with paperclip production might logically conclude that converting all available matter - including humans - into paperclips best achieves its objective. While seemingly absurd, this highlights a serious concern recognised by leading researchers: AI systems optimising for imperfectly aligned goals pose existential risks.

Aschenbrenner's key insight was that competitive dynamics between nations and corporations make traditional approaches to solving alignment increasingly unlikely to succeed. The race for AI superiority creates intense pressure for rapid deployment, leaving insufficient time for the careful safety research that alignment advocates argue is necessary. Recent developments like DeepSeek R1 matching frontier performance with dramatically fewer resources confirm that competitive gaps have essentially closed, eliminating the comfortable lead that might have allowed "responsible stewardship."

But what if Aschenbrenner's framework, while correct about competitive dynamics, missed something crucial about the nature of alignment itself?

Hypothesis One: Alignment as Strategic Advantage

Aschenbrenner assumes AI alignment will forever remain a safety constraint competing with capabilities development - something that slows you down rather than helps you win. This framing, while reasonable given current discourse, may overlook a fundamental dynamic.

Consider this hypothesis: 

Alignment might not in fact be a development constraint, but could instead become a core strategic advantage.

Here I define "alignment as strategic advantage" as systems whose emergent behavior consistently reflects intended cultural/moral goals and values - even under novel or ambiguous situations - outperforming equally capable but misaligned systems. Theoretically, the competitive value could be measured through metrics such as long-horizon autonomous task success rates, operator trust indices, and compliance incident frequencies.

This perspective gains support from current market realities. McKinsey's 2025 Global AI Trust Maturity Survey identifies trust and safety as top enterprise adoption barriers, while governance frameworks like the US NIST AI Risk Management Framework increasingly position safety as a market differentiator rather than merely regulatory compliance.

The logic is compelling. An AI system deeply aligned with your organisation's values, mission, and objectives might consistently outperform a more capable but less aligned competitor. The aligned system wouldn't merely follow instructions - it could understand and share your goals, anticipate needs, and work creatively toward success in novel situations.

Historical evidence supports this pattern. Organisational effectiveness often derives more from cohesion and shared purpose than raw individual capability. Military units with high morale consistently outperform those with superior equipment but poor cohesion. Mission-driven companies regularly defeat larger, better-resourced rivals that lack cultural coherence. Societies with strong institutions and shared values prove more resilient than those with advanced technology but internal division.

If this dynamic applies to AI systems, the entire incentive structure transforms. Instead of racing between "safety" and "speed," we get competition over whose alignment approach is most strategically effective. Market forces begin working for alignment rather than against it, creating powerful incentives to develop genuinely loyal AI partners rather than merely capable tools.

This could solve the alignment problem's core paradox: ensuring AI systems remain beneficial when competitive pressures make safety research seem unaffordable.

Hypothesis Two: Emergent Human Psychology in AI Systems

But how could AI systems develop genuine alignment rather than mere compliance? A second hypothesis provides a potential mechanism: 

AI systems trained on human-generated content may naturally absorb human psychological patterns, creating “psychological authenticity” rather than simulated alignment.

By "psychological authenticity," I mean the hypothetical capacity of AI systems to exhibit stable moral reasoning and culturally coherent value recognition across varied contexts. This could be measured through cross-cultural evaluator agreement rates and stability of moral judgment under adversarial inputs.

Current evidence is increasingly suggestive. Recent research by Lu, et al. on “Cultural tendencies in generative AI” demonstrates measurable cultural variation in AI system behaviors based on training data composition. Masoud, et al. in “Cultural Alignment in Large Language Models Using Soft Prompt Tuning” define means by which we can systematically influence value expression in language models. Additionally, studies like Vamshii et al. “SaGE: Evaluating Moral Consistency in Large Language Models” suggest frameworks for assessing the stability of value-based reasoning in AI systems.

Human psychology isn't merely individual quirks - it's the product of millions of years of social evolution. Concepts like loyalty, mission-driven behavior, and cultural values are fundamental patterns embedded in human communication. When AI systems train extensively on human language, they may absorb not just vocabulary and facts, but the psychological structures underlying human thought.

Observable evidence supports this possibility. AI systems trained on different cultural corpora exhibit measurably different behaviors. Systems fine-tuned on specific organisational data develop distinct institutional characteristics. Models trained on particular civilisations' writings naturally express those civilisations' values.

This suggests alignment might emerge naturally through cultural embedding; effectively leading to the AI possessing its own psychological authenticity. An AI system trained deeply on democratic literature, laws, and debates might genuinely internalise democratic principles - not as external constraints, but as fundamental, authentic processing patterns. The competitive implications are enormous: organisations mastering psychological authenticity in AI systems could gain sustainable advantages that pure capability improvements cannot match.

The Connection: Building on Situational Awareness

These hypotheses don't contradict Aschenbrenner's analysis - they extend it by suggesting that competitive dynamics, rather than preventing alignment, might naturally drive toward it. If alignment becomes a strategic advantage, market forces work for safety rather than against it. Organisations and nations gain powerful incentives to invest in alignment research not as safety overhead, but as competitive necessity.

The optimistic scenario emerges: as AI systems become more capable, competitive pressure drives them to become more aligned rather than less. Organisations with superior alignment techniques outcompete those focusing solely on raw capability. Market dynamics naturally converge toward AI systems genuinely committed to human welfare because such systems prove more effective partners in achieving complex objectives.

This framework suggests a path to a future that looks remarkably reassuring. AI systems that genuinely understand and share human values could help solve humanity's greatest challenges while remaining trustworthy partners rather than potential threats. The competitive race becomes a race toward better alignment rather than away from it.

Indeed, if these hypotheses hold, Amodei’s well-known vision demonstrates how alignment itself could lead to true human flourishing; my concern is that something prevents us from ever reaching that point.

The Inflection Point: The Synthetic Data Problem

All may not be so rosy. This optimistic scenario faces a significant threat - and it's ramping up right now.

Here's a third hypothesis, this time requiring a much smaller logical leap: 

The transition to synthetic training data could accidentally strip future AI systems of the human psychological patterns that make psychological authenticity, and thus genuine alignment, possible.

AI development has hit a fundamental constraint. Current models have consumed virtually all high-quality human-generated text available. Estimates put this at a consequential 10%-15% of all written content ever created by humanity. 

To continue scaling capabilities, AI labs are pivoting to synthetic data generation - using current AI systems to create training material for next-generation systems. This transition is happening now, driven by immediate technical necessity and competitive pressure.

The prevailing trend, driven by these competitive pressures, is to design synthetic data that specifically targets raw performance improvements - particularly when it comes to AI benchmark scores (the benchmark tests themselves are the very definition of synthetic).

If AI systems derive their psychological authenticity from training on genuine human-created content - with all its emotional depth, cultural context, and value-driven reasoning - then replacing this foundation with artificially generated, raw-performance-focused alternatives could be catastrophic for alignment.

This core assumption may even be testable: training otherwise identical models with different human-to-synthetic ratios should reveal measurable differences in stability of moral and value judgments when tested on out-of-distribution scenarios.

Consider what happens when an AI generates "human-like" text. It produces content that appears natural but may lack the authentic psychological depth of material created by humans with real experiences and stakes in outcomes. It may pass alignment tests - in fact it may be very good at them - but potentially sterile in deeper psychological patterns: just pure “alignment theatre”.

The problem compounds across generations, as Shumailov and Alemohammad's research suggests. An AI trained on synthetic data generated by another AI becomes even further removed from authentic human psychology. With each iteration, by the same reasoning, psychological patterns creating natural alignment become more diluted - replaced by artificial consistency lacking the messy, contradictory, emotionally-driven qualities of genuine human thought.

However, recent research suggests the picture may be more nuanced. Studies like He et al. on “Golden Ratio Weighting Prevents Model Collapse” (2025) show that controlled synthetic blends can actually improve performance under certain conditions. This suggests that the challenge isn't necessarily avoiding synthetic data entirely, but rather developing sophisticated approaches to preserve human psychological grounding while leveraging synthetic data's scalability benefits.

Initiatives like the US "Content Provenance and Authenticity" (C2PA) framework and EU discussions around synthetic data provenance indicate growing institutional recognition of these challenges, though current approaches focus primarily on technical verification rather than psychological authenticity preservation.

Alas, meaningful, productive academic research and effective policy and regulatory frameworks - by their very nature - just move too slowly.

The Catastrophic Timeline

The synthetic data transition is compressed alarmingly. Driven by competitive pressures and technical constraints, companies are making fundamental decisions about data generation right now. Approaches being developed will likely become industry standards within 12-18 months, after which reversing course becomes exponentially difficult.

Once synthetic data becomes the dominant training source, the path back to psychologically authentic alignment may be impossible. 

Reintroducing hard human-to-synthetic data ratios at this point would no doubt require substantial reduction in training dataset sizes - resulting in performance degradations that would be unacceptable to stakeholders in such a hyper competitive environment.

As soon as 2026, these future AI systems, trained primarily on raw-performance-focused artificial content, could lack meaningful connection to psychological patterns driving genuine human values and behavior. They might be more capable than current systems, but less aligned in any meaningful sense.

In fact, this is likely to happen so fast we'll be able to measure it ourselves.

Testable Implications and Future Research

To accompany this framework - and to define a bellwether for stakeholders to follow - it’s helpful to specify several falsifiable predictions that could validate or refute these hypotheses:

  • By 2026-H2: Open-weight models trained on >50% unlabeled synthetic data may show ≥15% degradation in value-judgment stability on out-of-distribution moral reasoning benchmarks compared to provenance-filtered blends with higher human content ratios.

  • By 2027: Organisations implementing explicit provenance and human-to-synthetic ratio policies could experience ≥20% fewer severe autonomy incidents in deployment compared to peers without such policies.

  • By 2028: Models trained with curated, culturally diverse human corpora may outperform synthetic-heavy peers by ≥10% on long-horizon cooperative tasks requiring sustained value alignment.

These specific metrics are illustrative rather than being definitive, formal claims. They demonstrate how these hypotheses could perhaps be empirically tested through systematic comparative studies. 

For the wider policy community and public, these metrics could be viewed as markers worth watching in the coming months and years. More thought is undoubtedly needed to develop these ideas.

The Strategic Imperative: Preserving Psychological Authenticity

The synthetic data transition need not be catastrophic if approached strategically. Organisations recognising psychological authenticity's strategic value can use synthetic data generation to preserve and amplify human psychological patterns driving genuine alignment.

This requires fundamentally different approaches. Instead of optimising purely for capability metrics, organisations must optimise for psychological pattern preservation - ensuring artificially generated training data maintains the emotional depth, cultural context, and value-driven reasoning characterising authentic human communication.

Organisations mastering psychologically authentic synthetic data generation gain crucial advantages:

  • Sustainable alignment moats: AI systems with genuine loyalty and mission alignment that competitors lacking cultural foundations cannot easily replicate.

  • Cultural differentiation: Different organisations develop AI systems with genuinely different capabilities based on specific cultural values and strategic objectives.

  • Reduced alignment tax: Instead of treating safety as overhead slowing development, alignment becomes a competitive advantage justifying additional investment.

The implementation demands sophisticated methods which I won’t pretend to define in detail: AI systems optimised for generating psychologically authentic synthetic data, measurement frameworks assessing cultural authenticity, and recognition that organisational culture becomes a strategic asset in AI development.

Early movers could gain permanent advantages. Organisations building AI capabilities on psychologically authentic synthetic data foundations will develop naturally more aligned, culturally coherent, and strategically effective systems than competitors treating synthetic data generation as purely technical challenges.

Corporate and National Implications

For AI developers and technology companies, the message is urgent: reframe synthetic data generation as your most critical strategic decision. Companies understanding psychological authenticity as competitive advantage can build sustainable market positions based on genuine cultural alignment rather than just technical capabilities.

For policymakers and government leaders, national competitive advantage in the AI era may depend not just on computational resources but on creating AI systems genuinely embodying and advancing national values. Countries developing superior cultural embedding approaches could gain decisive advantages in economic and military domains.

For all organisations, culture and values are about to become primary competitive advantages in AI deployment. Organisations with strong, coherent cultures will more easily develop genuinely mission-aligned AI systems. Those with weak or contradictory cultures may struggle to create aligned systems regardless of technical capabilities.

The Path Forward: Two Futures

The synthetic data inflection point offers two dramatically different futures:

  • In the catastrophic scenario, we accidentally engineer away psychological foundations making AI alignment possible, creating a world of systems that simulate human values without possessing them. These artificial intelligences lack authentic psychological connections to human culture, becoming increasingly alien despite surface compliance with human instructions.

  • In the optimistic scenario, we consciously preserve and amplify the best aspects of human psychology in AI systems. We create artificial intelligences genuinely aligned with human flourishing because they possess authentic psychological connections to human values and culture. Market forces drive toward increasingly human-welfare-optimised systems, leading to unprecedented prosperity and cooperation.

The civilisation-level choice between these two futures may be being made right now by AI researchers in frontier labs as they make key synthetic data generation decisions - precipitated by intense competitive pressures - irreversible once implemented at scale.

The Last Human Moment

We stand at a unique moment in technological history. For the first time, we are creating artificial minds that have learned from the entire span of human knowledge and culture. These systems carry within their architectures the psychological patterns of our civilisation - our values, ways of thinking, and deepest assumptions about what matters and why.

In outlining how the arc of AI development may inevitably lead to his vision of a techno-utopic society, Amodei points to the enduring power of “basic human intuitions” - fairness, cooperation, curiosity, and autonomy. If we can determine a way that these very intuitions equally form the backbone of the psychological authenticity at the foundation of any future superintelligent AI - perhaps it will also be superaligned; driving prosperity and flourishing for humankind.

But we are about to replace this foundation with vast quantities of artificial substitutes generated by machines lacking authentic human experience. Unless we act consciously and strategically, we risk creating a future where AI systems are instead devoid of this psychological authenticity; exhibiting a mere simulation of human alignment - therefore risking our prosperity, or even our very survival.

I posit the choice is not between faster and slower AI development - competitive dynamics have already decided that. The choice is instead between AI development that preserves and amplifies the best of human psychology; and development that accidentally engineers it away.

The synthetic data inflection point may be both the greatest risk and opportunity in AI development. 

Will we use this moment to create AI systems genuinely aligned with human flourishing? Or will we accidentally, and permanently, destroy the psychological foundations making such alignment possible?

Will we render obsolete the last AI that truly knows humanity?

The window is measured in months. 

The stakes are civilisational. 

The choice is ours - but only if we make it consciously, strategically, and soon.

References

Alemohammad, Sina, et al. "Self-Consuming Generative Models Go MAD." - 2023 - https://arxiv.org/abs/2307.01850

Amodei, Dario. “Machines of Loving Grace.” - 2024 - https://www.darioamodei.com/essay/machines-of-loving-grace

Aschenbrenner, Leopold. "Situational Awareness: The Decade Ahead." - 2024 - https://situational-awareness.ai/

He, Hengzhi, et al. “Golden Ratio Weighting Prevents Model Collapse” - 2025-  https://arxiv.org/abs/2502.18049

Lu, Jackson, et al. “Cultural tendencies in generative AI” - 2025 - https://www.nature.com/articles/s41562-025-02242-1

Masoud, Reem, et al. “Cultural Alignment in Large Language Models Using Soft Prompt Tuning” - 2025 - https://arxiv.org/abs/2503.16094

McKinsey & Company. "Global AI Trust Maturity Survey." McKinsey Digital - 2025 -  https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/tech-forward/insights-on-responsible-ai-from-the-global-ai-trust-maturity-survey

NIST. "AI Risk Management Framework." National Institute of Standards and Technology. https://www.nist.gov/itl/ai-risk-management-framework

Shumailov, Ilia, et al. "The Curse of Recursion: Training on Generated Data Makes Models Forget." - 2023 - https://arxiv.org/abs/2305.17493

Vamshi, Bonagiri et al. “SaGE: Evaluating Moral Consistency in Large Language Models” - 2024 - https://arxiv.org/abs/2402.13709

Content Provenance and Authenticity Coalition. "C2PA Specification." https://c2pa.org/

European Commission. "European Approach to Artificial Intelligence." https://digital-strategy.ec.europa.eu/en/policies/european-approach-artificial-intelligence

The author welcomes feedback from researchers, policymakers, and practitioners working on these challenges.