Developer Notes: How AudioNova Uses AI for Mixing
This page is intended for technical collaborators, contributors, or curious users who want to understand how AudioNova works behind the scenes.
βοΈ How the Mixing Worksβ
At a high level, AudioNovaβs architecture looks like this:
User Prompt β ChatGPT (LLM) β Internal Command Generation β DSP Engine β Audio Output
π§ Language Model (LLM)β
AudioNova uses ChatGPT or a similar LLM to interpret natural language prompts. It doesn't hardcode audio instructions β it interprets them dynamically.
-
Prompts are parsed for:
- Effects (reverb, compression, EQ)
- Parameters (e.g. "50% wet", "tight low end")
- Descriptors (e.g. "gritty", "warm", "club-ready")
-
We rely on prompt patterns rather than rigid keyword parsing
-
All prompt interpretation is context-aware and generative, not rule-based
π Audio Processing Engineβ
Once the AI interprets a prompt, the system generates intermediate instructions like:
{
"effect": "reverb",
"target": "vocals",
"amount": 0.25,
"style": "plate",
"preDelay": 10
}
These instructions are passed to a DSP backend β either custom code, a plugin chain, or a render pipeline β depending on the implementation.
π¬ Prompt Grammar Evolution:
Weβre continuously training prompt formats that are:
- Musician-friendly
- Conversational
- Modular (e.g., chaining effects)
We're designing a prompt grammar that feels like texting your dream engineer β not learning a new syntax.
- Future iterations may use:
- Structured prompt tagging
- Reference-based intent matching
- Few-shot examples per prompt style
π€ Contributing:
Interested in helping improve prompt interpretation, grammar design, or backend DSP logic?
Join the dev team or open an issue. This is mixing by language β and weβre just getting started.