Data Loaders vs. Offline Data Loader

This page outlines the differences between online and offline data loaders.

We offer two mechanisms to load data to an LLM, tailored to different applications:

  1. Data Loaders: read documents or stream datasets online, every time you run your flow. If you want only to retrieve a segment of the data for the LLM, you can connect the data loader to a Vector Database (Vector Databases).

  2. Offline Data Loaders: upload documents/urls/data to a vector database offline, when drop data in the node, and retrieve the most relevant data online, every time you run your flow.

The difference between these mechanisms lies in their online vs. offline nature.

  • A Data Loader + a Vector DB:

    • Offline: 1) setup the parameters or 2) upload the file

    • Online: 1) load the data, 2) chunk it, 3) compute embeddings, 4) upload the embeddings to the vector DB, and 5) make a query in the vector DB.

  • An Offline Data Loader:

    • Offline: 1) setup the parameters or 2) upload the file, 3) load the data, 4) chunk it, 5) compute embeddings, 5) upload the embeddings to the vector DB

    • Online: 1) make a query in the vector DB.

Document search is much faster for your flow at inference, but the search data will be static.

(A Data Loader) + (A Vector Database) + (Offline upload) = (Offline Data Loader)

or

(A Data Loader) + (A Vector Database) = (Offline Data Loader) + (Online upload)

The following table outlines some practical examples:

Use caseArchitectureWhy?

Loading a knowledge base.

Offline Data Loader

If the knowledge base is static (it won't change frequently), then you can upload all the data offline and have a faster model.

Loading data from a database.

Data Loader + Vector DB

The database data will change frequently. Hence, you need to read the data online and query the model that way.

Performing a google search.

Data Loader + Vector DB

The data will change depending on the google search. Hence, you will always read data online.

Loading data from an URL that changes per user

Data Loader + Vector DB

The data will change based on the user or API call. Hence, you need to read data online.

Uploading 1000s of documents

Offline Data Loader

It will be very slow to load 1000s of documents online, it is better to upload them offline.

Last updated