Three Ways Google Reads Content – And What That Means For Your Website

Robot-Blog.jpg

Three Ways Google Reads Content – And What That Means For Your Website

The success of any form of writing is 99% about understanding who you’re writing for – your audience.

Content that matches readers’ styles and interests, or that informs, educates, intrigues and entertains – even provokes – relies on a writer’s ability to consider who is on the other end of the experience.

For those wanting to design and run websites that are simple to find, easy to navigate and interesting or useful enough to attract loyal visitors, it just so happens that your audience isn’t only flesh-and-blood customers and clients, it’s also Googlebot or other search engine spiders which crawl over every single word on your site to determine how useful your website might be to those searching for relevant information.

In short, impress Googlebot while it indexes your site and you’re a long way along the road to getting your website in front of readers looking for what you have to offer.

The skill with which search engines have learnt to differentiate “good” and “bad” copy has increased leaps and bounds – to the extent that stuffing pages full of meaningless strings of keywords and relevant phrases is now utterly pointless.

Instead, websites are rewarded for their natural, intuitive, intelligently thought out and carefully constructed content with high rankings in SERPs (which have recently become much more competitive), in exactly the same way as similar writing is rewarded by readers by shares, links, repeat visits and conversions.

So how does Google go about “reading” your web pages? Although this is a fairly well guarded secret, one way or another, features of the indexing algorithm have been exposed within the digital marketing industry. And just for our readers, here is the inside scoop on three methods Google uses to distinguish “good” content from “bad”:

  1. Phrase-based indexing and synonyms: Google has become particularly adept understanding language and spotting commonly used phrases and their alternatives. This means that basic content which relies on keyword density and term frequency – and can therefore appear clunky and unnatural to the reader – is becoming confined to history. Instead Google is able to differentiate between useful and useless phrases (for example, the difference between “The Treaty of Waitangi” and “The Treaty of”), tell how a word or phrase is used in context, and also spot alternative terms (for example, “The Queen of England”, “Monarch”, “Ruler” and “Elizabeth II”). Because our use of language is so fluid, with words and terms entering and leaving use more frequently than would allow anyone to keep an up-to-date list, search engines use algorithms to crawl the internet to create groups of words which appear together frequently.

  2. Co-occurrence: Because the Googlebot crawls through so much information, they are also able to recognise the frequency with which similar terms and phrases occur in the same document, page or even sentence. In this way, Google is able to see how certain phrases relate to each other and determines how relevant overall content is to a specific search term. For example it’s all very well Google recognising the similarity between “dress” and “ballgown”, but by analysing the relationship between co-occurring phrases, it should also recognise more sophisticated relationships such as “Karen Walker” and “Winter Season Fashion”.

  3. Term Frequency-Inverse Document Frequency: OK, this is where it gets a little more mathematical! It might not seem such as stretch as to credit a load of algorithm-powered Googlebots learning the similarity between words and the frequency with which they’re used together – but TF-IDF is a way for search engines to use the number of times a phrase or term occurs in a document in relation to the total number of documents in which that phrase occurs to determine how important that phrase is overall *inhale*. And the more often a phrase is used, the less important it becomes in that content. It’s highly likely that search engines use TF-IDF or something similar to weed out those “content farmers” who just spin out pointless, jargon-filled text revolving around repetitive keywords.

So now you’ve got a handle on how Google “reads” your content, what does that mean for your website?

Well, the basics are still at play here: not using duplicate content, creating meaningful ALT attributes and title tags, and employing individual structures for each page.

But content now needs to be natural, too.

First of all, you should recognise that the more content you have, the more content Googlebot has to index and the better it should perform in searches – but depending on your audience, longer, more descriptive content is also better for the reader too. And because search engines are keyed in to synonyms and alternative phrases, you don’t have to worry about adding variety to your writing.

Search engines which recognise familiar levels of co-occurrence – the sort of natural text which readers would be comfortable with – will also recognise when a document has suffered from “keyword stuffing” and penalise the content – even recognise it as spam.

Equally, a search engine using anything like TF-IDF will look at webpages and content containing similarly re-occurring phrases and terms and find it difficult to determine which is more important.

Google’s fascination with Artificial Intelligence tends to be seen as a futuristic plan, but its roots are found deep within its history as a search engine and its long list of patents and algorithms designed to “read” content on the web more like a human being than a series of linked servers.

And that means that, year by year, it is developing the sort of skills which recognise the quality of the content it trawls. To respond to these developments, websites need to look hard at the quality of the content they create and why they are including it – in short, it has to be attractive, informative and easy to understand for the reader. It just so happens that one of those readers is Google.