Small BusinessReal EstateFinanceMarketingLegalOperationsSalesTemplatesPricingDocs
Get started

Step Guide: Web Scrape

Knowledge

Fetch a URL and return its content as markdown, text, or HTML

Overview

The Web Scrape step fetches a single URL and extracts the page content along with metadata like title, description, author, and published date. Output format is configurable: Markdown (best for feeding into AI steps), plain text, or raw HTML.

When to Use

  • Fetch article or blog post content to summarize or analyze with AI
  • Pull product page details for competitive analysis
  • Extract page content from URLs found in an RSS feed or spreadsheet
  • Capture page metadata (title, description, author) for content cataloguing

How It Works

Configure a URL (static or referenced from an upstream step via {{stepId.fieldName}}). The step fetches the page, extracts the main content, and returns it in your chosen format along with page metadata. Output fields are locked: content, title, description, url (final URL after redirects), published_date, and author.

Actions

Fetch
Fetch a URL and return its content plus page metadata
How it works
Resolves the URL from config (can be a static URL or a {{stepId.fieldName}} reference), fetches the page, and extracts content in the configured format. Outputs {content, title, description, url, published_date, author}. The url output is the final URL after any redirects.
  • Use Markdown format when feeding content into an AI Text Generate step — it preserves structure without HTML noise
  • Enable "Include image URLs" if you need to capture images referenced on the page
Tips
  • Markdown is the default and best format for AI processing — only switch to Text or HTML when downstream steps specifically need those formats
  • Pair with a For Each step to scrape multiple URLs from a list
  • Some pages behind login walls or with aggressive anti-bot measures may not return content