OpenCourtData Bot

Technical reference for webmasters. This page describes how the OpenCourtData web crawler operates, how to identify it, and how to control its behaviour on your site.

Purpose This bot indexes publicly available UK court listing and judgment data to support open access to justice information on opencourtdata.uk. It is non-commercial and operated in the public interest.

User-Agent string

Every HTTP request made by this crawler carries the following User-Agent header:

OpenCourtDataBot/1.0 (+https://bot.opencourtdata.uk; [email protected])
Field Value Description
Product token OpenCourtDataBot/1.0 Name and version of the crawler for robots.txt matching
Info URL https://bot.opencourtdata.uk This page — crawler documentation for webmasters
Contact [email protected] Direct email for crawler-related issues

To confirm that a request originates from this crawler, see the Verification section below.

Schedule & crawl rate

Property Value
Crawl frequency Once daily at 06:00 UTC — not continuous
Concurrency Maximum 5 simultaneous Lambda workers
Minimum inter-request delay 1 second per domain (honouring Crawl-delay if larger)
Response timeout 20 seconds
Maximum body size 10 MB per page
Protocols HTTPS only
Infrastructure AWS Lambda, region eu-west-2 (London)
IP range source AWS published IP rangeseu-west-2, service LAMBDA

Configuring robots.txt

This crawler checks your robots.txt file before every request and strictly honours Disallow rules and Crawl-delay directives. Changes to your robots.txt take effect within 24 hours.

Allow all crawling (default)

No configuration is necessary if you wish to allow this bot to crawl your site.

Disallow specific paths

User-agent: OpenCourtDataBot
Disallow: /private/
Disallow: /admin/
Crawl-delay: 5

Block all crawling

User-agent: OpenCourtDataBot
Disallow: /
The product token for robots.txt matching is OpenCourtDataBot. Wildcard tokens such as * will also be respected.

Verifying crawler identity

Two methods are available to confirm that a request originates from this crawler:

Method 1 — Reverse DNS lookup

Perform a reverse DNS lookup on the connecting IP address and confirm that the hostname resolves back to the same IP (forward-confirmed reverse DNS). Requests originate from AWS Lambda in eu-west-2; the resolved hostname will be within the compute.amazonaws.com or eu-west-2.compute.internal domain space.

Method 2 — Cryptographic signature (Web Bot Auth)

Every request carries an Ed25519 HTTP Message Signature header that can be verified using the public key published at the JWKS directory. See the Web Bot Auth section for details.

Simulating a request

You can reproduce a crawler request using curl:

curl -v \
  -H "User-Agent: OpenCourtDataBot/1.0 (+https://bot.opencourtdata.uk; [email protected])" \
  -H "Accept: text/html,application/xhtml+xml" \
  "https://example.gov.uk/page"

Web Bot Auth & cryptographic signing

This crawler implements Cloudflare Web Bot Auth, an open standard based on RFC 9421 HTTP Message Signatures. Every request is signed with an Ed25519 private key. Edge networks and WAFs that support this standard can verify the crawler identity in-band without IP allowlisting.

Signed components

The following components are included in each signature:

("@method" "@path" "@authority" "user-agent");tag="web-bot-auth";keyid="opencourtdata-bot-key-v1"

Headers present on every request

Header Description
Signature-Agent URL of this bot information page (https://bot.opencourtdata.uk)
Signature-Input Signed component list, key ID, and creation timestamp
Signature Base64-encoded Ed25519 signature over the listed components

Public key directory

The public signing key is published in the standard key discovery path:

https://bot.opencourtdata.uk/.well-known/http-message-signatures-directory

This file is served with Content-Type: application/http-message-signatures-directory+json as required by the specification.

!
Key rotation Signing keys carry a 12-month expiry (exp field in the JWKS). WAF configurations should check the nbf/exp fields and be updated when a new key is published.

Data policy

What we index
Publicly accessible UK court hearing lists, judgment metadata, and court directory pages. We do not access areas requiring authentication.
What we store
Structured metadata only (page title, URL, links, HTTP status, SHA-256 content hash). Raw HTML is not retained beyond the duration of the crawl.
Personal data
We do not seek out or process personal data beyond what is already published in official public court records. If you believe personal data has been indexed in error, contact us at [email protected].
Right to erasure
We will action takedown requests for personal data within 72 hours. Block our crawler via robots.txt and we will not re-index that content.
Retention
Indexed data is retained for up to 12 months before being purged or archived.

Contact

If you have questions or concerns about this crawler — including crawl rate complaints, data removal requests, or suspected misuse — contact us directly:

Channel Address Response time
Crawler issues & takedowns [email protected] Within 2 business days
Main site opencourtdata.uk