Why I Allow Scraping

Scraping sometimes gets a bad name — and not always unfairly. But it can also be a smart, hands-on way to learn how the web works and build tools that make life easier.

StorePhotos.ca is intentionally clean, consistent, and easy to scrape. That’s not an accident. It’s part of how I think good websites should behave.


Why You Might Want to Scrape This Site

There are lots of reasons to learn scraping — and StorePhotos is a friendly place to practice:

  • Scraping teaches how the web really works — from HTML and CSS to HTTP, headers, and structured data.
  • It’s fun. Especially when you’re trying to extract something precise from messy markup or a poorly organized page.
  • It’s also a great way to test accessibility — scrapers and screen readers often face similar challenges.

You might also scrape this site to build an offline copy, feed a prototype, or generate your own dataset. That’s all fair game.

If your plan is to hoard the entire site in case it disappears… well, StorePhotos isn’t going anywhere. And the Internet Archive might be a better fit for that kind of thing.


Why I Make It Easy

There’s no JavaScript wall here. No obfuscated markup. No cookie prompts blocking you from loading content.

That’s because:

  • Useful content should be accessible. If someone finds value here, they shouldn’t need permission or tracking to use it — even offline.
  • Blocking scrapers often blocks screen readers and other assistive tools. I’d rather make things accessible to real people than shut everyone out just to control distribution.
  • As for copyright, LLMs, and data ownership - that battle was lost more than a decade ago. I publish, it's part of why I built StorePhotos so ownership is dear to me...but we lost and it's time to move on and find new business models.

This site is open because I want it to be useful.


A Word of Caution

StorePhotos is scraping-friendly — but it still lives on shared infrastructure. That means I’m asking you to be thoughtful.

  • Please don’t hammer the server with thousands of requests per second.
  • Check and respect the robots.txt file. It’s short and clear.
  • Don’t mirror the site to rehost it — especially without attribution. I’ll probably find it, and if I do, I might turn it into a case study.
  • Keep in mind: the only difference between a scraper and a denial-of-service attack is volume.

Also, scraping can be legally messy depending on where you are or where your site is hosted. I’m not coming after you — but someone else might. So do your homework.


This page exists because I’d rather you build something than feel stuck, unsure, or shut out. If you’re curious, respectful, and a little careful, you’re more than welcome here.