Firecrawl

Comprehensive guide to the Firecrawl node in StackAI: discover its most common actions, input requirements, configurations, and output examples for seamless web data extraction.

What is Firecrawl?

Firecrawl is a powerful integration within StackAI that enables automated web data extraction, web scraping, and content retrieval from websites. It is designed to help users gather structured or unstructured data from web pages, making it ideal for research, monitoring, and automation workflows.


How to use it?

To use the Firecrawl node in StackAI, simply add the node to your workflow and select the desired action. Configure the required inputs and settings based on your use case. Firecrawl supports a variety of actions, from scraping a single URL to crawling entire websites or searching for specific content. Connect the node to downstream nodes to process or analyze the extracted data.


Example of Usage

Suppose you want to extract the main content from a specific web page. You would use the "Scrape from URL" action, provide the target URL as input, and receive the extracted text and metadata as output. This data can then be used for further analysis, summarization, or storage.


Firecrawl: Most Common Actions

Below are the most commonly used Firecrawl actions in StackAI, along with detailed explanations, input requirements, configurations, and output examples.


1. Scrape from URL

Description: Extracts the main content, metadata, and structure from a single web page.

Inputs:

  • url (Required): The full URL of the web page to scrape. Example: "https://example.com/article"

Configurations:

  • None required for basic usage.

Outputs:

  • content (Always returned): The main text content of the page.

  • metadata (Always returned): Information such as title, description, and author.

  • structure (Optional): Structured representation of the page (e.g., headings, sections).

Example:

{
  "content": "This is the main article text...",
  "metadata": {
    "title": "Example Article",
    "description": "A sample article for demonstration.",
    "author": "Jane Doe"
  },
  "structure": {
    "headings": ["Introduction", "Main Content", "Conclusion"]
  }
}

2. Web Scrape

Description: Performs advanced scraping with options for custom selectors, extracting specific elements or data points from a web page.

Inputs:

  • url (Required): The target web page URL.

  • selectors (Optional): CSS selectors or XPath expressions to target specific elements. Example: [".article-title", ".author-name"]

Configurations:

  • None required for basic usage.

Outputs:

  • results (Always returned): An array of extracted elements or data points.

Example:

{
  "results": [
    {"selector": ".article-title", "value": "Example Article"},
    {"selector": ".author-name", "value": "Jane Doe"}
  ]
}

3. Batch Scrape

Description: Scrapes multiple URLs in a single request, ideal for bulk data extraction.

Inputs:

  • urls (Required): An array of URLs to scrape. Example: ["https://site1.com", "https://site2.com"]

Configurations:

  • None required for basic usage.

Outputs:

  • results (Always returned): An array of objects, each containing the content and metadata for a URL.

Example:

{
  "results": [
    {
      "url": "https://site1.com",
      "content": "Content from site 1...",
      "metadata": {"title": "Site 1"}
    },
    {
      "url": "https://site2.com",
      "content": "Content from site 2...",
      "metadata": {"title": "Site 2"}
    }
  ]
}

4. Crawl Website

Description: Automatically crawls a website, following links to extract content from multiple pages.

Inputs:

  • start_url (Required): The starting URL for the crawl.

  • max_depth (Optional): How many link levels deep to crawl (default is 1). Example: 2

Configurations:

  • None required for basic usage.

Outputs:

  • pages (Always returned): An array of page objects, each with content and metadata.

Example:

{
  "pages": [
    {
      "url": "https://example.com/page1",
      "content": "Page 1 content...",
      "metadata": {"title": "Page 1"}
    },
    {
      "url": "https://example.com/page2",
      "content": "Page 2 content...",
      "metadata": {"title": "Page 2"}
    }
  ]
}

Description: Searches a website or a set of pages for specific keywords or patterns.

Inputs:

  • url (Required): The base URL to search.

  • query (Required): The keyword or pattern to search for. Example: "AI automation"

Configurations:

  • None required for basic usage.

Outputs:

  • matches (Always returned): An array of search results with context.

Example:

{
  "matches": [
    {
      "url": "https://example.com/page1",
      "snippet": "AI automation is transforming industries..."
    }
  ]
}

Summary Table: Firecrawl Actions

Action
Required Inputs
Configurations
Outputs

Scrape from URL

url

None

content, metadata, structure

Web Scrape

url, selectors (opt.)

None

results

Batch Scrape

urls

None

results

Crawl Website

start_url, max_depth(opt.)

None

pages

Search

url, query

None

matches

Last updated

Was this helpful?