Skip to main content

Developer Notes: How AudioNova Uses AI for Mixing

This page is intended for technical collaborators, contributors, or curious users who want to understand how AudioNova works behind the scenes.


βš™οΈ How the Mixing Works​

At a high level, AudioNova’s architecture looks like this:

User Prompt β†’ ChatGPT (LLM) β†’ Internal Command Generation β†’ DSP Engine β†’ Audio Output


🧠 Language Model (LLM)​

AudioNova uses ChatGPT or a similar LLM to interpret natural language prompts. It doesn't hardcode audio instructions β€” it interprets them dynamically.

  • Prompts are parsed for:

    • Effects (reverb, compression, EQ)
    • Parameters (e.g. "50% wet", "tight low end")
    • Descriptors (e.g. "gritty", "warm", "club-ready")
  • We rely on prompt patterns rather than rigid keyword parsing

  • All prompt interpretation is context-aware and generative, not rule-based


πŸŽ› Audio Processing Engine​

Once the AI interprets a prompt, the system generates intermediate instructions like:

{
"effect": "reverb",
"target": "vocals",
"amount": 0.25,
"style": "plate",
"preDelay": 10
}


These instructions are passed to a DSP backend β€” either custom code, a plugin chain, or a render pipeline β€” depending on the implementation.

πŸ’¬ Prompt Grammar Evolution:

We’re continuously training prompt formats that are:

- Musician-friendly

- Conversational

- Modular (e.g., chaining effects)

We're designing a prompt grammar that feels like texting your dream engineer β€” not learning a new syntax.

- Future iterations may use:

- Structured prompt tagging

- Reference-based intent matching

- Few-shot examples per prompt style


🀝 Contributing:

Interested in helping improve prompt interpretation, grammar design, or backend DSP logic?

Join the dev team or open an issue. This is mixing by language β€” and we’re just getting started.