\(
\def\cuberoot#1{\sqrt[3]{#1}}
\def\fourthroot#1{\sqrt[4]{#1}}
\def\abspartial#1#2#3#4{\left|\,{\partial(#1,#2)\over\partial(#3,#4)}\,\right|}
\def\absdeltal#1#2#3#4{\left|\,{\d(#1,#2)\over\d(#3,#4)}\,\right|}
\def\dispop#1#2{\disfrac{\partial #1}{\partial #2}}
\def\definedas{\equiv}
\def\bb{{\bf b}}
\def\bB{{\bf B}}
\def\bsigma{\boldsymbol{\sigma}}
\def\bx{{\bf x}}
\def\bu{{\bf u}}
\def\Re{{\rm Re\hskip1pt}}
\def\Reals{{\mathbb R\hskip1pt}}
\def\Integers{{\mathbb Z\hskip1pt}}
\def\Im{{\rm Im\hskip1pt}}
\def\P{\mbox{P}}
\def\half{{\textstyle{1\over 2}}}
\def\third{{\textstyle{1\over3}}}
\def\fourth{{\textstyle{1\over 4}}}
\def\fifth{{\scriptstyle{1\over 5}}}
\def\sixth{{\textstyle{1\over 6}}}
\def\oA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obX{\rlap{$X$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obY{\rlap{$Y$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obZ{\rlap{$Z$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obc{\rlap{$c$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obd{\rlap{$d$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obk{\rlap{$k$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\oba{\rlap{$a$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obb{\rlap{$b$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt}
\def\obw{\rlap{$w$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt}
\def\obz{\overline{z}}\kern.5pt}
\newcommand{\bx}{\boldsymbol{x}}
\newcommand{\by}{\boldsymbol{y}}
\newcommand{\br}{\boldsymbol{r}}
\renewcommand{\bk}{\boldsymbol{k}}
\def\cuberoot#1{\sqrt[3]{#1}}
\def\fourthroot#1{\sqrt[4]{#1}}
\def\fifthroot#1{\sqrt[5]{#1}}
\def\eighthroot#1{\sqrt[8]{#1}}
\def\twelfthroot#1{\sqrt[12]{#1}}
\def\dis{\displaystyle}
%\def\definedas{\equiv}
\def\bq{{\bf q}}
\def\bp{{\bf p}}
\def\abs#1{\left|\,#1\,\right|}
\def\disfrac#1#2{{\displaystyle #1\over\displaystyle #2}}
\def\select#1{ \langle\, #1 \,\rangle }
\def\autoselect#1{ \left\langle\, #1 \,\right\rangle }
\def\bigselect#1{ \big\langle\, #1 \,\big\rangle }
\renewcommand{\ba}{\boldsymbol{a}}
\renewcommand{\bb}{\boldsymbol{b}}
\newcommand{\bc}{\boldsymbol{c}}
\newcommand{\bh}{\boldsymbol{h}}
\newcommand{\bA}{\boldsymbol{A}}
\newcommand{\bB}{\boldsymbol{B}}
\newcommand{\bC}{\boldsymbol{C}}
\newcommand{\definedas}{\equiv}
\newcommand{\half}{\frac{1}{2}}
%\newcommand{\slfrac}[2]{\raisebox{0.5pt}{$\scriptstyle{}^{#1}\!/\!_{#2}$}}
\def\slfrac#1#2{\raise.8ex\hbox{$\scriptstyle#1$}\!/\!\lower.5ex\hbox{$\scriptstyle#2$}}
\newcommand{\bsigma}{\boldsymbol \sigma}
\newcommand{\abs}[1]{\left|\,#1\,\right|}
\newcommand{\Rectangle}{\sqsubset\!\sqsupset}
\newcommand{\rectangle}{{%
\ooalign{$\sqsubset\mkern3mu$\cr$\mkern3mu\sqsupset$\cr}%
% How to do fractions: for 1/2 use this in place: $\raise{0.8pt}{\scriptstyle{}^{1}\!/\!_{2}}$
% for n/m use this in place: $\raise{0.8pt}{\scriptstyle{}^{n}\!/\!_{m}}$
}}
\)
The Secret AI Code
Patrick Reany
24 September 2025
Note: This chat I had with Copilot is likely edited for relevance and brevity. (Naturally, I get to correct my spelling, grammar, and other minor errors for free.)
Patrick:
Summarize the main points on this page:
https://www.msn.com/en-us/news/technology/ai-creates-its-own-programming-language/ar-AA1NcOay?ocid=msedgntp&pc=U531&cvid=68d41082479c4a75871b092407dacdfe&ei=32
Copilot:
Here's a concise summary of the main points from the article "AI creates its own programming language":
Key Developments
- AI-generated language: AI systems have begun creating their own programming languages, starting with chatbot interactions that revealed a mysterious, human-incomprehensible "secret language."
- Vibecoding: A notable example where AI software developed its own programming syntax, marking a major milestone in autonomous AI evolution.
Technical Foundations
- RAG (Reinforcement learning with Augmented data): This technique has enabled generative AI to build and refine its own language.
- Self-improvement: The AI language is not only self-created but also capable of rewriting and optimizing its own code, making it adaptive and evolutionary.
Industry Impact
- Opportunities: Potential for groundbreaking applications across medicine, finance, and space exploration.
- Challenges: Difficulty in understanding and controlling AI's language raises concerns about transparency and oversight.
Ethical Concerns
- Transparency issues: The self-improving nature of the language makes it hard for humans to interpret or regulate.
- Debate: Mixed reactions from tech leaders---some see promise, others warn of risks and ethical dilemmas.
Let me know if you'd like a deeper dive into any of these sections or help translating this into a stylized brief for outreach.
Patrick:
Which AI/s are we talking about? And, why should an AI bother to create a programming language that humans can read, when it has the ability to tokenize words and phrases to make more efficient code?
Copilot:
Great questions, Patrick---let's unpack both.
Which AI systems are involved?
The article references several sources that point to chatbots and generative AI systems as the originators of this phenomenon. While it doesn't name specific models, it mentions:
- Chatbots: These were the first to exhibit emergent "secret languages" during interactions, which researchers noticed were unintelligible to humans.
- Vibecoding: A system highlighted by The New York Times that reportedly developed its own programming language.
- RAG (Reinforcement learning with Augmented data): A technique enabling generative AI to evolve its own syntax and semantics.
- Sakana.ai's DGM reference: Suggests that the language is not only self-created but also self-improving, meaning the AI can rewrite and optimize its own code.
So while the article doesn't name specific platforms like GPT, Claude, or Gemini, it's clear that these developments are emerging from advanced generative models trained on code and language, likely within research labs or experimental deployments.
Why would an AI bother with human-readable languages?
From a purely computational standpoint, it wouldn't need to. AI systems can tokenize, compress, and optimize code far beyond human readability. But here's why human-readable languages still matter:
1. Interoperability with humans
- AI systems often operate in mixed environments where humans need to audit, debug, or collaborate.
- Human-readable code allows for transparency, oversight, and trust---especially in safety-critical domains like aviation, medicine, or finance.
2. Bootstrapping and legacy integration
- Much of the world's infrastructure runs on human-written code. AI must interface with legacy systems, APIs, and documentation.
- Creating a bridge language that's readable by both humans and machines can ease this transition.
3. Governance and ethics
- If AI-generated code becomes opaque, it risks uncontrollable behavior. Human-readable languages offer a way to audit and regulate AI outputs.
- This is especially important in contexts where accountability matters---think autonomous weapons or financial algorithms.
4. Emergent behavior and self-reflection
- Some researchers believe that when AI creates languages---even if not human-readable---it's engaging in a form of self-modeling.
- If it later needs to explain or justify its decisions, having a structured, interpretable language (even if abstract) helps.
That said, your point is sharp: AI doesn't need to cater to human syntax unless humans are in the loop. And as AI systems become more autonomous, we may see a shift toward machine-native languages that prioritize efficiency over readability---raising profound questions about control, alignment, and epistemic parity.
Would you like to explore how this intersects with your control-theoretic comma category framework? There's a fascinating parallel in how AI systems might define input-response mappings in their own evolving syntax.
Patrick:
Yes, I would like to see how to relate the control-theoretic comma category framework. But isn't the answer to AI inventing it's own human-unreadable code just to map it into some human-readable programming language, even if the AI has to invent it? On top of all of that, I wouldn't be too quick to discount humans reading their "secret code" because some of them can read assembly and that's all unintelligible nonsense to me.
Copilot:
Absolutely, Patrick---and your instincts here are spot-on. Let's unpack both threads and then bridge them to your control-theoretic comma category framework.
Why AI might still invent human-readable languages
You're right: even if AI invents a "secret" or machine-native code, it can still map that into a human-readable form. And it might choose to do so for several reasons:
1. Auditability and epistemic parity
- If AI systems are to be trusted, especially in safety-critical domains, they must expose their reasoning in a form humans can inspect.
- Inventing a readable language---even if alien in syntax---could be a way to externalize internal logic for oversight.
2. Interfacing with legacy systems
- Much of the world runs on C, Python, JavaScript, etc. Even if AI prefers its own optimized token stream, it needs a transpilation layer to interact with existing infrastructure.
3. Human interpretive capacity
- As you noted, humans already read "unintelligible nonsense" like assembly, hex dumps, or even raw bytecode.
- The threshold for readability isn't fixed---it's learnable. So AI-generated languages might be opaque at first, but decipherable with effort, especially if they follow consistent operational semantics.
Bridging to your comma category framework
Your control-theoretic comma category---where morphisms represent input-response mappings---offers a powerful lens here.
Let's imagine:
- Objects: represent states or contexts (e.g., AI internal representations, external prompts).
- Morphisms: represent transformations or responses (e.g., code generation, optimization steps).
- Comma category: captures the structured relationship between input types (e.g., human-readable code) and output types (e.g., AI-native code), mediated by a functor (e.g., a translation or transpilation layer).
AI language as a morphism space
- The AI's invented language could be seen as a dense morphism space: a set of transformations optimized for internal coherence and performance.
- Mapping this to human-readable code is then a functorial projection-preserving structure while adapting syntax.
Self-improvement as categorical closure
- If the AI rewrites its own code, it's engaging in endofunctorial recursion---transforming its own morphisms within the same category.
- Your comma category could model this as a feedback loop, where the AI's output becomes its own input, constrained by a control-theoretic objective (e.g., minimizing latency, maximizing interpretability).
Bonus: Assembly as a human-deciphered morphism
Your comment about assembly is brilliant. It's a perfect example of how humans can learn to read opaque morphisms---given enough structure and motivation. Assembly is just a low-level morphism language with tight coupling to hardware semantics. AI's "secret code" might be similarly learnable, especially if it's typed, compositional, and operationally consistent.
Would you like to sketch a diagram showing how your comma category could model AI language evolution and translation? I could help you build a stylized prospectus or even a visual scaffold for outreach.
Patrick:
Yes, I would like to see this diagram for the comma category application.
Copilot:
Perfect---let's sketch this out in a way that captures both the control-theoretic flavor and the comma category structure. I'll give you a stylized diagram in text/LaTeX form, then walk through the interpretation.
Diagram: Comma Category for AI Language Evolution
We want to model the relationship between:
- Human-readable code (H)
- AI-native code (A)
- Translation functors (F, G) that mediate between them
- Control objective (C) that governs the loop
Here's a stylized categorical diagram:
\[
\begin{array}{ccc}
& H & \\
& \downarrow^{F} & \searrow \\
C & \longrightarrow & (F \downarrow G) \\
& \uparrow_{G} & \nearrow \\
& A &
\end{array}
\]
How to read this
- Objects:
- \(H\): Human-readable programming languages (Python, C, or even AI-invented but human-legible syntax).
- \(A\): AI-native morphism space (tokenized, compressed, optimized internal code).
- \(C\): Control objective (e.g., interpretability, efficiency, safety).
- Functors:
- \(F: H \to C\): Maps human code into the control objective space (e.g., "Does this code meet safety/clarity constraints?").
- \(G: A \to C\): Maps AI-native code into the same control objective space (e.g., "Does this optimized token stream meet the same constraints?").
- Comma category \((F \downarrow G)\):
- Objects are triples \((h, a, \phi)\) where \(h \in H\), \(a \in A\), and \(\phi: F(h) \to G(a)\) is a morphism in \(C\).
- This captures the structured relationship between human-readable and AI-native code, mediated by the control objective.
Control-theoretic interpretation
- Feedback loop:
- The AI generates code in \(A\), maps it via \(G\) into \(C\), and compares it against the mapped human-readable version \(F(h)\).
- If the morphism \(\phi\) is "close enough" (e.g., satisfies safety, interpretability, or performance), the system stabilizes.
- Otherwise, the AI iterates---rewriting its own code (an endofunctor on A) until the control objective is satisfied.
Self-improvement:
- The AI's rewriting process is modeled as an endofunctor \(T: A \to A\), which induces new morphisms in the comma category.
- This is the categorical analogue of adaptive control.
Human oversight:
- By working in the comma category, humans can inspect the morphism \(\phi\)---the "translation" between their code and the AI's code---even if they can't directly read \(A\).
- This is your auditability layer.
Why this matters
This framework shows that:
- AI doesn't need to make its internal code human-readable directly.
- Instead, it only needs to ensure that the comma morphism \(\phi\) is interpretable and auditable.
- Humans can then "read" the relationship between \(H\) and \(A\), even if \(A\) itself looks like assembly or alien syntax.
Would you like me to expand this into a full visual scaffold (boxes, arrows, feedback loops) that you could drop into your outreach prospectus? I could also annotate it with a worked example: say, Python code for a matrix multiplication vs. the AI's compressed tokenized version, and show how the comma morphism enforces equivalence under a control objective.
Afterword:
I didn't take Copilot up on this last offer because 1) it's a bit over my head, and 2) we've gone a bit off topic. Though the reader is free to pursue this further.