website logo
⌘K
Introduction
Getting started
πŸ“Reference
Data API
Cache API
Search API
Storage API
Queue API
Crawler API
πŸ”‘Authentication
βœ”οΈConfiguration
⚑Contributing to hyper
πŸ”ŒBuilding your own adapter
Code of Conduct
hyper RFCs
Style Guide
Bugs
Before Submitting A Bug Report
Building your own plugin
API Clients
Build your own client πŸ› οΈ
hyper connect
NodeJS Client πŸš€
Deno Client πŸ¦•
Middleware
Hooks
🚒Deploy
Deploy hyper on Render
Deploy hyper on AWS
Deploy hyper on Digital Ocean
Deploy hyper on Google Cloud
Deploy hyper on Azure
πŸ“œChangelog
FAQ
πŸ“•Terminology
Parameters
What is REST?
What is JSON?
πŸ“˜Glossary
Docs powered byΒ archbeeΒ 
7min

Crawler API

A Crawler is a service that takes a source URL and the number of levels in depth to traverse and a script to extract data from a web page. Then the service builds a list of child links by traversing from the source URL to all the HREFs and retrieves each page and apply the script to the HTML document to generate JSON documents that can be injected into a search engine for full-text search.

Features

  • Asynchronous
  • Headless browser (parses js enabled sites)

API Commands

ο»Ώ

  • PUT /:app/crawler/:nameCreate a Jobο»Ώ
  • POST /:app/crawler/:name - Run a Job
  • GET /:app/crawler/:name - Get a Job Document
  • DELETE /:app/crawler/:name - Delete a Job
  • GET /:app/crawler - List Crawler Jobs

ο»Ώ

ο»Ώ

Updated 09 Jul 2021
Did this page help you?
Yes
No
UP NEXT
Create/Update a Job
Docs powered byΒ archbeeΒ 
TABLE OF CONTENTS
Features
API Commands