Data Loaders

Integrating LLM with your own custom data is essential for building an application. To achieve this, we allow you to import data from any source seamlessly. Our data loaders do the following:

  • Load the data from the source.

  • Convert the data to text or arrays.

  • Split the data into smaller segments (with overlapping content).

  • Return the list of segments of the data.

Because of this process, the data loaders cannot be connected directly to an LLM but rather to a vector database (see Vector Databases).

A few of our data loaders. Note that their output is a gray square showing the need to connect to a vector database.

We offer support for the following data loaders:

  1. String: loads a large body of text as one of your inputs (which cannot fit into an LLM prompt).

    1. Parameters: Text (exposed to the API)

    2. Outputs: List of text segments (connects to a vector database).

The inline document receives a query from the LLM and returns the most relevant section.
  1. Upload: this allows you to upload file types (such as .txt, .csv, .html, .pdf, .py, .md, and others) to convert them into text data.

    1. Parameters: Files uploaded by the user.

    2. Outputs: List of text segments (connects to a vector database).

The document upload node receives a file, converts its content to text and splits it to go to a vector DB.
  1. WebScrapper: load an URL and scrap its HTML into markdown text.

    1. Parameters: URL, Modality (full HTML or meta-data) (exposed to the API)

    2. Outputs (in HTML mode): List of text segments (connects to a vector database).

    3. Outputs (in meta-data mode): Returns the meta-data of the website as text (can be connected to an LLM).

The WebScrapper Node has two modalities, it can send the scrapped HTML to a vector DB or send the website meta-data as a string to an LLM or Output node.
  1. Google Search: performs a Google Search, returns the top results, scraps the HTML of the top results, and returns the text segments of the most relevant results.

    1. Parameters: API Key for SerpAPI.

    2. Inputs: search criteria (Text from user input or LLM output)

    3. Outputs: List of text segments (connects to a vector database).

The Google Search node uses SerpAPI to perform a Google search, web-scrap the first few results and send the content to a Vector Database.
  1. Notion: loads the page and subpages of a Notion database as markdown text.

    1. Parameters: Client secret and database ID. (See here how to get them https://developers.notion.com/docs/create-a-notion-integration).

    2. Outputs: List of text segments (connects to a vector database).

  2. MongoDB: loads documents of a MongoDB collection as a list of JSONs.

    1. Parameters: database, collection, and URI.

    2. Inputs: MongoDB query in PyMongo format (Text from user input or LLM output).

    3. Outputs: List of text segments (connects to a vector database).

MongoDB data loader recieves a PyMongo formated query and output a list of text chunks
  1. Postgres: loads rows of a Postgres database as a list of JSONs.

    1. Parameters: database, username, password, host url, port.

    2. Inputs: SQL query (Text from user input or LLM output)

    3. Outputs: List of text segments (connects to a vector database)

Postgres data loader recieves a SQL query and output a list of text chunks
  1. Airtable: loads rows of an Airtable database as a list of JSONs.

    1. Outputs: List of text segments (connects to a vector database).

  2. YouTube Node: transcribe videos from YouTube with this node. You can then send the transcript to LLM nodes for further processing and/or summarization. Here is a full video explaining how to use it (video).

  3. Slack (coming soon).

  4. Big Query (coming soon).

  5. Zoom (coming soon).

  6. Zendesk (coming soon).

Last updated