What Is robots.txt?

The robots.txt file is a simple text file that tells bots what they’re allowed (or not allowed) to access on a website.

It lives at the root of the domain — for example:

https://storephotos.ca/robots.txt

It’s not an authentication layer or a firewall. It’s a polite request, written in plain-ish English, that well-behaved bots are expected to follow.


Basic Structure

A robots.txt file is made up of one or more rulesets, each beginning with a User-agent line and followed by one or more Disallow or Allow lines:

User-agent: *
Disallow: /admin/
Allow: /admin/public/

That example says:

All bots (*)

Are disallowed from accessing anything under /admin/

Except for anything under /admin/public/

The Three Most Important Directives

Here’s what you’ll see most often:

  1. User-agent This identifies the bot the rule applies to. You can name a specific bot (like Googlebot):
User-agent: Googlebot

Or you can use * for all bots:

User-agent: *
  1. Disallow

This blocks access to a path:

Disallow: /private/

A blank Disallow means nothing is disallowed:

Disallow:
  1. Allow

This explicitly permits access to a path — even if a broader Disallow rule would normally block it.

Allow: /private/help-page/

How Precedence Works

A lot of people get confused here and precedence is very important so I'll cover it a lot through these tutorials. Most bots follow the longest match wins rule. And a more specific rule overrides a broader one.

User-agent: *
Disallow: /private/
Allow: /private/help-page/

This means:

/private/secrets/ is blocked /private/help-page/ is allowed

But not all bots implement this correctly. That’s why it’s good to be explicit and conservative when writing robots.txt.

What About Wildcards?

Officially, the standard only supports two wildcard-ish operators:

  • matches any sequence of characters $ marks the end of a URL