arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Audio Input Node

The Audio Node allows you to upload or record an audio clip as input. The audio is converted to text (using an audio-to-text LLM) and passed to your model.

hashtag
Providers

The Audio Node enables you to choose from two providers that will transcribe your audio:

  • deepgram: Uses Deepgram's API for audio transcription. Supports multiple models and submodels.

  • whisper-1: Uses OpenAI's Whisper v1 model. Does not support model or submodel selection (uses a default configuration).

hashtag
Model

Available only when using the deepgram provider. Defines the main model used for transcription.

  • nova: Legacy model, fast and lightweight.

  • nova-2: Latest generation with improved accuracy and speed.

  • enhanced: Optimized for high-quality audio and complex content.

This field is disabled for whisper-1.

hashtag
Submodel

Further refines transcription behavior. Available only with deepgram.

  • general: Default submodel for general-purpose transcription.

  • Other submodels exist depending on Deepgram's model.

This field is disabled for whisper-1.

hashtag
Audio Node Settings

If you're using your own audio-to-text model, here you can add your own API key to use it.

hashtag
How to use it

  1. Add an Audio to Text node to your flow.

  2. Connect the Audio to Text node to an LLM node.

  3. Mention the Audio to Text node in the LLM node by pressing "/" and selecting the Audio to Text node.

hashtag
Expose the Audio to Text node to your users

  1. Go to the Export tab.

  2. Enable the audio node in the Inputs section.

  3. Press Save Interface to save your changes.

base: Baseline transcription model with balanced performance.

Add an Output node to your flow.
  • Connect the Output node to the LLM node.

  • Your users should now see an upload button in the interface.