This is an HTTP API you can use to unfurl and extract content from any web page as JSON. You can get the title, description, open graph, embed content or any other information available at a given public URL (check examples below).
You can use this for building:
Rich text inputs (auto-linking)
Site monitoring & validation tools
Sales & marketing tools
Data scrapers for research
You will need an access token to use Page.REST API. An access token costs $5 and will be valid for one year.
Why use this?
You might be wondering why you should use Page.REST API rather than coding it yourself.
Here are some reasons:
It handles the nitty gritty edge cases (HTML parsing is 😤 😰 😩 )
You save network bandwidth (only download what you need from a page)
Hosted using Google Cloud Functions - so it will have high availability
You want to hack something quickly!
How to use
Try the examples to see what API returns. You can edit the code to try different URLs. (alternatively, you can run it on Postman)
The default request grabs site’s title, description, logo, favicons, canonical URL, status code, and Twitter handle.
This is probably the most useful feature. You can use CSS selectors to retrieve content from matching elements. In the example, we use selectors to retrieve the businesses and their founders featured in IndieHackers. (You can use up to 10 selector queries.)
Append &embed=1 to the request URL to get the oEmbed content for the page as part of the response (only if available).
Append &og=1 to the request URL to get the OpenGraph content for the page as part of the response (only if available).
Get any HTTP headers defined in the response. In the example, we check security headers of github.com.
Pay to get an access token (one-timefee of $5 and valid for one year).