Web Scrape Step

Step Guide: Web Scrape

Knowledge

Extract content from any webpage for AI processing

Overview

The Web Scrape step fetches a single URL and extracts the page content along with metadata like title, description, author, and published date.

Fetch

Fetch a URL and extract its content in your chosen format (Markdown, plain text, or HTML) along with page metadata. Markdown is the default and best format for feeding content into AI steps.

Fetch article or blog post content to summarize or analyze with AI
Pull product page details for competitive analysis
Extract page content from URLs found in an RSS feed or spreadsheet
Capture page metadata (title, description, author) for content cataloguing

Configure a URL (static or referenced from upstream data via {{stepId.fieldName}}). The step fetches the page, extracts the main content, and returns it in your chosen format. Output fields are locked: content, title, description, url (final URL after redirects), published_date, and author.

Tips

Use Markdown format when feeding content into an AI Generate step
Enable "Include image URLs" if you need to capture images referenced on the page
Pair with a For Each step to scrape multiple URLs from a list
Some pages behind login walls or with aggressive anti-bot measures may not return content

Frequently Asked Questions

What format should I use for AI processing?: Markdown (the default). It preserves page structure without HTML noise, making it ideal for downstream AI steps.
Can I scrape pages behind login walls?: No. The step fetches publicly accessible URLs. Pages requiring authentication or with aggressive anti-bot measures may not return content.

Related Steps

Step Guide: AI Generate

Generate text, emails, structured data, or images with AI

Step Guide: Research Agent

An AI agent that searches the web and writes briefs with sources

Step Guide: For Each

Process each item in a list individually