Firecrawl
Comprehensive guide to the Firecrawl node in StackAI: discover its most common actions, input requirements, configurations, and output examples for seamless web data extraction.
What is Firecrawl?
Firecrawl is a powerful integration within StackAI that enables automated web data extraction, web scraping, and content retrieval from websites. It is designed to help users gather structured or unstructured data from web pages, making it ideal for research, monitoring, and automation workflows.
How to use it?
To use the Firecrawl node in StackAI, simply add the node to your workflow and select the desired action. Configure the required inputs and settings based on your use case. Firecrawl supports a variety of actions, from scraping a single URL to crawling entire websites or searching for specific content. Connect the node to downstream nodes to process or analyze the extracted data.
Example of Usage
Suppose you want to extract the main content from a specific web page. You would use the "Scrape from URL" action, provide the target URL as input, and receive the extracted text and metadata as output. This data can then be used for further analysis, summarization, or storage.
Firecrawl: Most Common Actions
Below are the most commonly used Firecrawl actions in StackAI, along with detailed explanations, input requirements, configurations, and output examples.
1. Scrape from URL
Description: Extracts the main content, metadata, and structure from a single web page.
Inputs:
url (Required): The full URL of the web page to scrape. Example:
"https://example.com/article"
Configurations:
None required for basic usage.
Outputs:
content (Always returned): The main text content of the page.
metadata (Always returned): Information such as title, description, and author.
structure (Optional): Structured representation of the page (e.g., headings, sections).
Example:
{
"content": "This is the main article text...",
"metadata": {
"title": "Example Article",
"description": "A sample article for demonstration.",
"author": "Jane Doe"
},
"structure": {
"headings": ["Introduction", "Main Content", "Conclusion"]
}
}
2. Web Scrape
Description: Performs advanced scraping with options for custom selectors, extracting specific elements or data points from a web page.
Inputs:
url (Required): The target web page URL.
selectors (Optional): CSS selectors or XPath expressions to target specific elements. Example:
[".article-title", ".author-name"]
Configurations:
None required for basic usage.
Outputs:
results (Always returned): An array of extracted elements or data points.
Example:
{
"results": [
{"selector": ".article-title", "value": "Example Article"},
{"selector": ".author-name", "value": "Jane Doe"}
]
}
3. Batch Scrape
Description: Scrapes multiple URLs in a single request, ideal for bulk data extraction.
Inputs:
urls (Required): An array of URLs to scrape. Example:
["https://site1.com", "https://site2.com"]
Configurations:
None required for basic usage.
Outputs:
results (Always returned): An array of objects, each containing the content and metadata for a URL.
Example:
{
"results": [
{
"url": "https://site1.com",
"content": "Content from site 1...",
"metadata": {"title": "Site 1"}
},
{
"url": "https://site2.com",
"content": "Content from site 2...",
"metadata": {"title": "Site 2"}
}
]
}
4. Crawl Website
Description: Automatically crawls a website, following links to extract content from multiple pages.
Inputs:
start_url (Required): The starting URL for the crawl.
max_depth (Optional): How many link levels deep to crawl (default is 1). Example:
2
Configurations:
None required for basic usage.
Outputs:
pages (Always returned): An array of page objects, each with content and metadata.
Example:
{
"pages": [
{
"url": "https://example.com/page1",
"content": "Page 1 content...",
"metadata": {"title": "Page 1"}
},
{
"url": "https://example.com/page2",
"content": "Page 2 content...",
"metadata": {"title": "Page 2"}
}
]
}
5. Search
Description: Searches a website or a set of pages for specific keywords or patterns.
Inputs:
url (Required): The base URL to search.
query (Required): The keyword or pattern to search for. Example:
"AI automation"
Configurations:
None required for basic usage.
Outputs:
matches (Always returned): An array of search results with context.
Example:
{
"matches": [
{
"url": "https://example.com/page1",
"snippet": "AI automation is transforming industries..."
}
]
}
Summary Table: Firecrawl Actions
Scrape from URL
url
None
content, metadata, structure
Web Scrape
url, selectors (opt.)
None
results
Batch Scrape
urls
None
results
Crawl Website
start_url, max_depth(opt.)
None
pages
Search
url, query
None
matches
Last updated
Was this helpful?