A version of this post originally appeared in the October 22, 2020 issue with the email subject line "When 404s and 301s swarm" and a review of market-leading web crawling software Screaming Frog.
On the internet there are zombies everywhere. Zombies are different from bots; they represent what once was but is no longer, the remnants of our digital past, the bits that were once human.
Zombies live at suck.com and televisionwithoutpity.com and wherever Alex Balk’s byline used to appear.
Snowfall, the interactive content from the New York Times, is is a bit of a zombie these days. The video thumbnails load, but the video doesn’t play.
Social network zombies are abundant: not only Friendster and MySpace but also the network hoppers who stop using their Facebook/Insta/Twitter for a year, then get a new outlook on life and make a different Facebook profile, quit after a few months, then repeat until they have 17 zombie social media profiles and none of them works. Digital zombies are part of our offline identity. As research for this newsletter (so committed), I watched Night of the Living Dead for the first time and totally loved it.
I care for a zombie, a defunct local news magazine website that hasn’t been updated since 2012. I should probably retire the whole thing, but I’m proud of the work our team did, so it sits, benign and probably begging to be hacked with malware. (I don’t know if upgrading to https is worth the archival value. Probably not, but I can’t bear to let it go.)
If you run a website, you likely have some zombie-fied parts too — the links that go nowhere, or that redirect to new links that differ from the original content. Those are zombies.
But with the right knowhow and the right set of tools, we can fight zombies. Zombie fights are perfect for days when you need to want to get something done but have minimal motivation to do much of anything with thought.
How to fight zombie links (aka 404s and 301s)
First, we should understand what we’re dealing with. Two main types of zombies afflict the common website: 404 errors and 301 redirects.
The 404 errors are the most dangerous. Of course it’s bad for user experience if someone clicks on a link that doesn’t work; that’s a no-brainer. But most often 404 errors are hiding deep within the architecture of your website, and your users don’t find them.
What can find those 404 errors deep in your crypts and labyrinthine hallways to nowhere? Search engines.
A primer for the unfamiliar: Search engines scan all the text and metadata on your website, following every link to its ultimate end—the process is called a crawl. The bot in charge? A spider.* The spider will follow every link it can until you tell it not to, or it runs out of so-called crawl budget.**
Most websites don’t need to plan for crawl budget; if you have under 5,000 pages and everything’s working, search spiders are gonna crawl right through that buddy in no time.
But if the spider finds a dead link, or worse, a hallway full of 404 errors, it’s gonna call “done” way sooner. Spiders and zombies don’t mix. The spider sees one or two zombies and anticipates many more, which, if you’ve seen the movies, is a solid zombie prevention strategy.
There are always more where that came from.
If you want a search engine spider to see all the glorious content you have created and report back to Google, get your zombie 404s out of the way. They are undead, broken and going nowhere. There are always more where that came from.
*It’s my 7th year deep in SEO and I still think the name is cute.
**I generally prefer to link to news/non-content marketing sources, but in this case, the “news” or specialist blog has ads all over it and loads like garbage. I very much prefer the clean, comprehensive version that explains the concept. This is one example of “how to get a link from a specialist even if your website doesn’t rank as high and you’re not an ‘official’ source.”
Combatting the not-dead-yet set: 301 redirects
Developers, UX folks, SEOs, users, everyone knows broken links are bad, so we build safeguards to protect from 404 broken link zombies. We cover the doors and walls with whatever plywood we have available. We mitigate the zombie impact.
When URLs are changed, the best way to communicate that change to a search engine is to implement a 301 redirect. The 301 http status code is the digital equivalent of post office mail forwarding: we don’t live here anymore, so please deliver mail to our new home.
Implementing 301 redirects is a perfect fix when you’re changing a large number of URLs or even domains. The 301 redirect indicates permanence: the old content is lost and gone forever and replaced with new content.
Redirect maps are central in any website redesign process, and good content managers have redirect processes in place to eliminate 404 errors whenever a URL is changed on their site. Take a piece of content down and throw up a new 301 redirect to guide both users and spiders from the old, busted URL to the new hotness. Make sure the 301 is in place in case any readers have bookmarked or linked to the content from external sites.
It works… for a bit.
Just as your best friend and sidekick will likely become zombiefied before the end of the film, 301 redirects can come back to bite you. One 301 redirect is fine. However, two or more 301 redirects in a row indicates that you may have a zombie problem.
The mail forwarding analogy works best, even though it has nothing to do with zombies: At least in the U.S., the post office will implement forwarding from one old address to a new address once, for a few months. The USPS will not follow you around from home to home, trying to find you at your new address. The system can’t handle that kind of change.
Search crawlers behave similarly. Search spiders will crawl through one 301 redirect, maybe two, and say, “ok, great, the new content is at a new address and I will let everyone know about it.”
When there are multiple 301 redirects in a row — especially within the same domain — that’s a signal to a spider that hey! There may be zombies. There may be 404 errors, more 301 errors, who knows. The spider just knows that the content is harder to reach and may turn away in favor of clearer, zombie-free roads.
Technically, the links aren’t broken and if a user follows the link, they’ll arrive at the new content eventually. But they definitely appear shadier than working links — we’ve all been redirected to an unfamiliar website before. Where are you taking me? we ask the redirected. Will I find the content I seek? Or will there be malware?***
For search engines, redirects take up time and energy and crawl budgets. Like all automations, computers are doing some heavy lifting on your behalf. It’s in everyone’s best interest to make the path to content as unobstructed and zombie-free as possible.
***Every time you use a URL shortener or QR code, you’re using a 301 redirect. Make sure that’s the only 301 redirect before you get to the content. I’ve learned from experience — and maybe this is only my generation who downloaded boatloads of viruses onto our college computers while we were pirating music and movies? — but unless I absolutely trust you as a source, I’m not clicking on your random URL shortener if you don’t tell me exactly where I’m going.
301 redirects and website redesigns
If you’re redesigning a website and changing URLs, you are already going to have a slew of 301 redirects in place. That’s why it’s so crucial to ensure that there are no other internal redirects upon launch: Redirect chains are bad for users, for search crawlers, and they add up.
How does this happen with new websites? Usually a developer enters a URL in building a page that’s not the final URL. Either they didn’t have the final list of URLs or, even more often, someone flubbed a typo and all the automated links in the site’s most important directory say “contnet” instead of “content.”
The error is discovered during the QA/proofing process and yeah: it’s a huge pain to change all the internal links to every page whose primary directory is /contnet-management/ instead of /content-management/.
What often happens: to ensure the launch timeline stays on track, developers implement a quick redirect so every link on the site that pointed to the typo now redirects to the fancy new page. But it’s about as sustainable as punching a zombie in the arm.
Implementing internal 301s before a website is even launched is the equivalent to a zombie movie with the first scene set in a graveyard. The undead are on their way and they’re going to show up sooner rather than later.
Yes, it’s a massive pain to change all of your internal links before launch. I totally get it. Hopefully you’re working with a CMS that lets you write a script to rewrite internal links en masse. With many newer CMSes, especially the no-code ones that are gaining popularity, you may have to rewrite them all individually. The URL typo that never ends…
But leaving them in means you’re starting with a mess, and search spiders don’t like messes.
Redesigns already mean that you’re changing URLs on most content. Before spiders even get to your website they’ve already encountered one redirect. Once they find all of these others that say “contnet” but redirect to “content,” they’ve established that your site has two redirects in a row. A redirect chain, the name for two or more 301s in a row, is a sign you may be kinda unhealthy. Maybe not trustworthy. Maybe decaying before your time. Maybe full of zombies.
If you want to be found, you have to ensure that spiders don’t think your site is filled with zombies.
How to find and fix your on-site 404 and 301 errors in one fell swoop
Obviously you can’t fix any errors outside of your own website. But I assure you, even the best of us gather broken links (404s) and redirects (301s) over the course of a year. They’re negligible errors in a one-off, but they add up and become a zombie problem.
To fix it, you need the right tools: a web crawler like Screaming Frog (freemium, reviewed below) or any other SEO/content audit tool should do the trick.
You can also access the Coverage report in Google Search Console, although it’s not quite as comprehensive and easy to use as a crawler. It’s like coming prepared to a zombie fight with a lighter, rather than a torch.
You can also crawl websites that are not live yet — and I suggest you crawl several times before you launch to ensure the healthiest website possible.
Here’s how to fix the errors on your own site.
- Use a website crawler to scan your website just like a search engine. I use Screaming Frog, a freemium crawler reviewed here today. However, pretty much any website audit tool embedded into SEO or content auditing software can find your broken links.
- Most web crawl tools present their crawl results in a spreadsheet. If you can, choose only HTML pages. Sort by status code.
- Dive into the 404 errors first, as they are the most deadly: which pages on your site are linking to them? Good crawl tools will highlight all the internal pages that link to the 404 error. In Screaming Frog, they’re called Inlinks, but every tool is slightly different.
- From there: hop into the healthy page your CMS with the unhealthy zombie links. Isolate the link you want to change, fix that link text, save and republish.
- Next, tackle the 301 errors. Make sure you change the link to the page where the content is actually living. Copy and paste the URL from the link destination itself — do not type it in manually or you are asking for trouble.
- Finally, if your crawler highlights external, or offsite links, (most do), fix all the links offsite that end in 404s and 301s. Find better content worth linking, or eliminate the link entirely.
Here’s a little video I made, demonstrating the process with Screaming Frog. It’s quite easy, but it’s tedious.
Once you’re done cleaning your zombies, your users and search traffic will thank you.
I wish there were better tools for cleaning up social media zombie accounts as well, but tbh they’re all a little shady. While they’re finding dead followers in Instagram or Twitter, I’m always wondering what other data they’re gathering and sending. If you have social zombie-killing tools you’d recommend, reach out!