Taxonomic tracking: how to find articles by using site construction

I’ve been a member of the technology media in Australia for over five years, and if there’s one thing I’ve seen from the journalism side, it’s that often members of the PR world don’t quite know how to quickly track what we’re doing in our realm.

I can see how it must be complicated: we’re all writing tons of articles. I generally run the site I work for (GadgetGuy.com.au) by myself, and even I can’t remember half the stuff I’ve written. Tech journalists are always writing, generally at a frantic pace, and in some ways you could say we’re all pushing out so much work that it’s hard to keep up with it all.

I’m sure I’m not the only one to get emails asking for links to stories, and for a PR trying to find the story for their reports, I can see why emailing a journalist could be fruitful. It doesn’t help that website searches can be very unhelpful, and much of this is due to the nature of website search engines just generally not being very good (apologies, we can’t all be Google).

I’ve never worked in PR — a quick look at my wardrobe will tell you that — and I’ve heard that people who work in this field regularly have to make daily or weekly reports showing the client which media has written about the topic. Tracking all of this can’t be easy, especially when there are a good 50 of us working on random stories, forcing you to visit our sites and trawl through listings of articles until you come across the magic one.

I understand that, and as a journalist who also codes websites, I can see how this can be a problem.

However, there is a solution, and for a lot of websites across Australia — and no doubt the world — it’s staring at you right in the face, and most PRs (and I suspect journalists) don’t even know it.

Cutting off the tags

It’s probably a safe assumption that almost everyone in the world knows what a webpage is, and given that you’ve taken the time to browse to this one, you probably know the basics: this, like other pages, is a piece of code that sits on a computer somewhere in the world.

In the good ‘ol days, webpages like this one had to be built from scratch. This meant using code inside chevrons, and would often start with <html> and end with </html>. Text would sit inside the <body></body> tags and you could do all manner of things with stylisation code, such as making things go bold <b>, sticking text in the centre <center>, italicising </i>, and so on and so on.

You can still do all of this today, but most websites rely on something called a Content Management System or “CMS.”

In essence, this is a web program that allows you to create pages of a website with an interface that is less code reliant. You can use the code if you like, but the theory behind a CMS is that the piece of code that runs here takes care of all the messy stuff for you. There’s no (or very little) hard coding of information, and you’re free to write everything as if you’re working on a document in Word, adding pictures, sounds, or anything else you need, and then hitting publish when you’re ready to go.

There are loads of content management systems out there, and there is no right one, but most of them share some pretty basic things, such as a place to upload files, a way of changing the titles of articles on the fly, a space to enter and work on the information, and so on and so on.

A CMS should also have a category system, which acts as a broad way of looking at topics. I might want a category for “computers” and another for “mobile phones,” and obviously articles about each of these will go in there. That makes sense.

Then there are tags.

Tags are a type of taxonomy that links articles, posts, and webpages together in a way that categories don’t. Rather than fit a broad topic like “mobile phones,” tags are words that describe things that may have been mentioned in the article.

For instance, an article about Samsung’s new Galaxy smartphone can include a whole bunch of tags that link articles differently. In that article, I can have a tag for “Samsung,” a tag for “Android,” a tag for “Galaxy,” and if the launch had marshmallows at it (which none have), I can include a tag for “marshmallow.”

When working on posts, articles, or webpages, tags that are frequently used work as a form of linking system for the publisher to find out what they’ve been writing about. In the above example, the use of the tag “Samsung” in multiple articles means it becomes a form of tracking to see every other article I’ve used that tag in. If I wanted to see all other uses of the Galaxy model series and I’ve used the “Galaxy” tag frequently on these articles, I can track that topic using this specific tag. And if I have an obsession with marshmallows, well I can use the “marshmallow” tag to see what else I’ve written about those yummy puffy pieces of gelatinous sugar.

With tags in play across a CMS, journalists and publishers can see the articles all linking to a topic. Many of these are topics, and many are descriptions. Many are also brands, and provided you know how to look for them, a PR can use a tag or taxonomy system to track articles written about a client.

Let’s go hunting

This article has gone a touch longer than I expected it would, which is strange, especially since I should be working on a novel that I hope will one day be published, but I digress.

In this section, I’m going to lay out a way of tracking brands and topics across the various technology news sites in Australia. This is a guide, and the work and research in here can change at any time because, quite frankly, I don’t work at these companies, and thus have no influence over how their content management systems will evolve.

If you’d like to see more about how I’ve seen each publisher works and the research I’ve done on the matter, there’s more information about that below this section in the bit titled “a closer look.”

For now, here’s a simple understanding of how this all works. We’ve explained tags in the sections above, and now you have to see how the webpages use them.

You won’t break any webpages doing this either; we’re merely exploiting the tag and taxonomy systems already employed by various companies. Many of these look different, but in essence, they’re all doing the same thing: calling on a term that acts as the designated taxonomy, and adding the name or word to it.

WordPress is one of the more common and popular (as well as easy to use) content management systems out there. Because of this, quite a few tech news sites run it. You can more or less bet that well over half of the blogs in our industry run it, so if you’re trying to keep an eye on one blog in particular, you’ll probably find it easier from here on in.

Here are four WordPress-based news sites in Australia, shown with a tag linking to their posts on Sony:

The first part in the URL is the domain name you already know, the http://www.whatever.com. You know that. The next part is the taxonomy. For Gizmodo and Lifehacker, that’s “tags,” while both Delimiter and Fat Duck Tech use “tag,” which is the default term for WordPress sites, of which all four of these are.

Then there’s the term. In our example, it’s Sony, but it really could be anything you want. It could be Samsung.

It doesn’t even have to be a company. It can be a term.

Technology journalists obviously aren’t all the same people. If we were, we’d all be insanely rich, writing the same thing, and one giant freak of a science experiment. But because we’re all writing in the same space, many of these tags are bound to be the same. I use the tag “NBN” in my articles about the National Broadband Network, and I also use companies as tags when I’m writing articles about any new product.

With that in mind, you can expect a bit of overflow and like-mindedness, and if a page doesn’t work, it won’t break the website.

This page will not work, unless the guys at Giz start tagging things with “fried butter.”

Not all sites use the same taxonomy structure, though, so you can’t just add “tag” or “tags” to the base URL and hope for the best. Different structures may incorporate the same base word, or it may use a completely different one altogether.

These all use slightly different structures, but as of the time this article was published, all worked. Cool, huh?

When you find one that works, the article listing will very likely be in date order, with the top most articles being the most recent. It’s a blog thing.

To make things even more interesting, a CMS can change the taxonomy word at any time and it shouldn’t affect their performance or site ranking terribly. If the CMS and template on that CMS has been coded properly, the site will adapt without any problems, and the URL you’ve been tracking will be useless until you work out the new structure. That said, most websites are unlikely to do a drastic change since it could affect their search engine ranking, if only for a short time.

A closer look

Above, you have an easy to work out list that gives you a quick overview of the tag and taxonomies I’ve tracked for various Australian tech sites, but for a closer look at how they seem to work, check the below.

Obviously, I don’t work at these companies, and this is all just worked out from understanding something about web development, though the content management systems being employed by the various organisations do share similarities.

Another note is that companies can — and have — changed content management systems before, and if that happens, it can change how the tags are structured across the board, so if this research stops working completely for some sites, apologies, but as I said, I don’t work at the companies.

Allure (Gizmodo, Lifehacker, Kotaku, Business Insider)

As I’ve seen it, the majority of sites run out of Allure work on WordPress. This makes the structure relatively easy to track, and in fairness to Allure, the team there leaves the tags in the articles (at the bottom where it says “tags”) to help you find your own way around.

This means that these all work:

As we mentioned in the sections above, replace the last word with the name or topic you’re looking for, and if the publication has written about it, you’ll find it in a list.

CBS Interactive (CNet, ZDNet)

CNet Australia works off brands, rather than visible tags. Some of these appear to be case sensitive, while others aren’t. You can type in pretty much any company after “/brand/” and add “.htm” to the end to see this in action.

These all work:

This works too:

But this does not:

ZDNet is a company that sits under the same CBS umbrella as CNet, and yet runs a different tag system.

Not surprising, since not all sites connected to a company run the same webpage system. It’s not the first time I’ve seen two or three publications coming out of one company run different systems, and it sure won’t be the last.

In any case, ZD seems to work from the taxonomy of “topic,” adding a hyphen and then your chosen brand. As such, these work:

However, because ZD is a more business oriented website, you may find some brands do not work, such as

On ZD, topic doesn’t just mean a brand, though. It can also mean exactly what the taxonomy suggests: a topic, which is exactly like a tag, and makes the following work:

Not a hard system to work out, provided you stay in the lines.

Future (T3, Tech Radar, Techlife, APC)

When I first started preparing this article, Future only had two technology publications, and both were trackable. One of them makes it very easy, which is very nice of them. Now with the former ACP titles owned by Future, we have a couple more we can track.

So let’s start with T3, which is based on WordPress and is thoroughly easy to work out.

Not difficult at all.

On the other hand, Tech Radar isn’t WordPress.

To its credit, Tech Radar puts the tags in the theme, making it easy for you, which is completely different to its sibling T3. Scroll to the bottom of an article and voila, you’ll see them there, with the tags even included in the article listing. Nice.

Replace that last word/name and you’re set.

The former ACP titles are a little different when it comes to tracking. Techlife is  fairly easy to look up, but APC might make you struggle, and even forces you to look at its code, though I’m not sure its developers intended for you to find the results.

Techlife doesn’t have a CMS I can directly work out, but it does have a tag system that I can see:

Replace the name in that one and the URL will do the rest.

APC is a different beast altogether, and while it has a tag system, it doesn’t seem to have one based on writing the name in the URL bar, but rather linking to an ID system. This happens, and there’s nothing wrong with a site choosing to work this way, it just means that from the site’s point of view, sitemaps generated by taxonomies are less likely to occur which can be a detriment to search engine standing.

As such, you can’t just type in a URL on APCMag.com, add a tag, and find your way to a company-based site listing.

And yet, if you know where to look, you also can, as APC’s developers seem to have originally had its entire tag identification listing running in the code once before. Essentially, it’s been commented out of the running code, which should mean it’s invisible to you, but if you right click to view the source code of an article page on APCMag, you can clearly see the tags and their associated ID numbers waiting for you in a commented out section.

Check the source of an APC article and search for the name of a company. If you see a the words "tags.htm?tid=" and a number to the left of the term, you have your ID. This image will click through to source code, though it may not work on all browsers.

Check the source of an APC article and search for the name of a company. If you see a the words “tags.htm?tid=” and a number to the left of the term, you have your ID.

The above image should link a browser to APCMag source code, but it also may not, so if you need to find the ID and this link doesn’t work, try right clicking the page and viewing the source code.

Continuing with the Sony example, a quick search through that code block tells us that the ID for Sony is 632, which therefore means this will work:

When you find the company in the APC code block (if it’s still there, because they can remove it at any time), change the number on the back of the link.

interestingly, APC also has an older “brand” system commented out, too, which looks to work in a similar capacity. After a quick test, this brand system appears to only show older articles, suggesting APC’s editorial team gave up on it some time ago.

Haymarket (Bit, CRN, IT News, PC & Tech Authority, SC Magazine)

I’ve only worked at a Haymarket publication for a week as a journalist, but I’ve never had access to its CMS. Also, I’m not an ASP or ASPX programmer. I can see basics in the code, but generally this isn’t an area I know.

That said, it wasn’t particularly hard to find a common connection between all these sites, even if I don’t know the CMS powering them.

As such, these all work:

With what you’ve seen thus far, I’m sure it won’t be difficult to see what needs to be changed in order to track articles.

IDG (Computerworld, Good Gear Guide, PC World, Tech World)

IDG’s sites have a lot of overlap in them. While I don’t quite know the CMS employed, suspect it might be something similar to Drupal.

Regardless, IDG’s system writes the tags at the bottom of the page, but you can find the tags through URLs, such as:

Each site employs a slightly different structure, but if you’ve been paying attention, you’ll see what tag or “company” is easy to replace.

Intermedia (Current, TechTrader)

Like CBS, Intermedia publications don’t seem to rely on the same CMS. I’ve only known two tech publications there, but each worked on something different.

TechTrader was based on WordPress, and thus has a fairly easy and straight forward tag system. The publication isn’t around anymore, but since this article was originally written with TTMag in existence, the information still, provided the site is online, should work.

Guess what needs to be changed there.

Current seems to be based on something different altogether. While I’m not entirely sure what it is, access to tags is available in the article, while the URL to track said tags is:

As I said, I haven’t the faintest what the CMS is off the top of my head, but in this example, the only thing that needs to be replaced is that last word or name. If anything else is removed, the URL doesn’t appear to function properly.

The independents

There are quite a few of us who fit in this category, and I’ll try to assist where I can.

The first job I had was with CyberShack, but from what I understand, the site has been through quite a few changes since then, and as such, my knowledge of this tag system was found through regular old looking.

Take a guess what needs to be changed in that one.

Renai LeMay’s Delimiter is another WordPress site, and is therefore relatively easy to work out the tags for.

Once again. Take a guess that needs to be changed there.

FatDuckTech is another independent blog, handled by the excellent Alex Kidman. Like Delimiter, it works on WordPress, and is therefore easy to map, too.

Oh, and then there’s the site I work for, GadgetGuy.com.au

I’d tell you the taxonomy system for tracking, but I can’t give you everything.

It’s not hard to work out, and if you’ve paid attention — or even looked at our menu at the top — you’ll be able to do a good job and work it out.

I have more of these articles planned, as I have more neat tips about tracking to share, but until then, have fun!