Data Loaders vs. Offline Data Loader
This page outlines the differences between online and offline data loaders.
We offer two mechanisms to load data to an LLM, tailored to different applications:
Data Loaders: read documents or stream datasets online, every time you run your flow. If you want only to retrieve a segment of the data for the LLM, you can connect the data loader to a Vector Database (Vector Databases).
Offline Data Loaders: upload documents/urls/data to a vector database offline, when drop data in the node, and retrieve the most relevant data online, every time you run your flow.
The difference between these mechanisms lies in their online vs. offline nature.
A Data Loader + a Vector DB:
Offline: 1) setup the parameters or 2) upload the file
Online: 1) load the data, 2) chunk it, 3) compute embeddings, 4) upload the embeddings to the vector DB, and 5) make a query in the vector DB.
An Offline Data Loader:
Offline: 1) setup the parameters or 2) upload the file, 3) load the data, 4) chunk it, 5) compute embeddings, 5) upload the embeddings to the vector DB
Online: 1) make a query in the vector DB.
Document search is much faster for your flow at inference, but the search data will be static.
(A Data Loader) + (A Vector Database) + (Offline upload) = (Offline Data Loader)
or
(A Data Loader) + (A Vector Database) = (Offline Data Loader) + (Online upload)
The following table outlines some practical examples:
Use case | Architecture | Why? |
---|---|---|
Loading a knowledge base. | Offline Data Loader | If the knowledge base is static (it won't change frequently), then you can upload all the data offline and have a faster model. |
Loading data from a database. | Data Loader + Vector DB | The database data will change frequently. Hence, you need to read the data online and query the model that way. |
Performing a google search. | Data Loader + Vector DB | The data will change depending on the google search. Hence, you will always read data online. |
Loading data from an URL that changes per user | Data Loader + Vector DB | The data will change based on the user or API call. Hence, you need to read data online. |
Uploading 1000s of documents | Offline Data Loader | It will be very slow to load 1000s of documents online, it is better to upload them offline. |
Last updated