Eli Schwartz

Category

Uncategorized

Home / Uncategorized
Uncategorized

Splunk for Advanced Technical SEO by Analyzing Log Files

The root of technical SEO is a deep understanding of a website’s architecture and how Google relates to the pages on the site. In my experience, the best way to gain this level of knowledge on any individual site is through deep log file analysis of Googlebot and user access logs.

Google Search Console reveals some of their crawl issues but it’s always at a very high level and aggregated. For a highly trafficked site, it would be nearly impossible to find the specific pages with issues.

On a site with many Googlebot entries, the number of rows will easily overwhelm your computer’s ram if you try to do this in Excel. Just opening the file will cause the system to slow and this is without even trying to run any queries.

Therefore, my favorite tool for this task is Splunk. For the unfamiliar, Splunk is a fantastic big data tool which allows you to parse large amounts of data quickly and easily to make important decisions.  Splunk even has a free version, which allows you to index up to 500MB per day. For many websites, this free version should be more than enough just to upload and analyze your access logs.

Here are the top 3 ways I use Splunk to help me with technical SEO efforts.

Find 404 pages generated by a Googlebot visit

404 pages (not found error) are wasted visit for every bot or human visitor. Every time a user hits a 404 page instead of the page they meant to see you are missing an opportunity to show them the correct content and at the same time they are having a subpar experience with your site. You can always proactively find 404’s with a crawling tool like ScreamingFrog, DeepCrawl or Oncrawl but if you have a lot of broken URL’s fixing all of them might not be a realistic goal.

Additionally, this doesn’t help you find a 404 resulting from an incorrect link on someone else’s site. When discovered by Googlebot, these links will send Googlebot to a non-existent page.

This is where log parsing becomes very helpful as you can discover 404’ed URL’s that are frequently accessed by users/bots and choose to either fix them or redirect the traffic to a working page.

Once you have your data imported into Splunk, here’s how you set up the query to find the 404 pages:

  1. First choose your time period. For this type of query, I usually use 30 days, but you can choose whatever you want.
  2. Type the following into the query box.

Index = {the name of your index} status = 404 | top limit = 50 uri

Your limit can be whatever you want, but I like to work with 50 URL’s for 404 pages to make sure I don’t miss any. Once this query completes, click on the statistics tab, and you will see all the URL’s that you need to urgently address laid out in a table.

Google expects 404 errors on every website, so the existence of them isn’t necessarily an urgent issue. However, some 404 URL’s could be the result of an unintended error or a valuable link (internal or external) pointing to the wrong page.  Running this analysis will allow you to make an uneducated decision.

Calculate the number of pages crawled by Google every day

If you use Google’s Search Console, then you are probably familiar with the screen where Google shows how many URL’s they crawl per day. This data may or may not be accurate, but you won’t know until you look in your logs to see how many URL’s Google actually crawls per day. Finding the daily crawl amount is very easy in Splunk once your data is uploaded.

  1. Choose a time period of 30 days (or 7 if you have a lot of data)
  2. Type the following query:

index ={name of your index} googlebot | timechart count by day

Once the query completes, click on the statistics tab, and you will have the true amount of pages crawled by Googlebot each day. For added fun, you can check out the visualization tab, and see how this changes over the searched time period.

This is more of an FYI than an urgent fix, but it is helpful to know if Google is now picking up new categories on a site or slowing a crawl. If either of these are true, it could be time to dig into the data.

Find rogue URL’s wasting crawl budget

As most marketers (should) know, Google allots a crawl budget to each site based on their Page Rank – not the visible one, but the real one in the Google black box.

If Googlebot wastes some of your valuable budget on URL’s you don’t care about, it obviously has less bandwidth to use on more important URL’s. Without knowing where Googlebot is spending time, you can’t know if your budget is being used effectively.

Splunk can help you quickly discover all the URL’s Googlebot is crawling which will then give you the data to make a decision about what should be added to your robots.txt file.

Here’s how you find the URL’s that Googlebot is crawling:

  1. Choose your time period. This can be any amount of time, and you should keep trying different time periods to find problematic URL’s.
  2. Type in the following query:

index={name of your index} googlebot uri_stem=”*”| top limit=20 uri

You can set the limit to whatever you want, but 20 is an easily manageable number. Once the query completes, click on the statistic tab, and you will have a table showing the top URL’s that Google is crawling. Now you can make a decision about any pages that should be removed, blocked by a robots file, or noindexed in the head of the page.

I use Splunk in over a dozen different ways to help me accomplish various SEO tasks, and these are just three of my most common uses.

Uncategorized

Reveal Hidden Keyword Opportunities with Search Console

Search console is one of my favorite SEO tools as it provides some of the best data directly from Google search database. The drawback to the Search Console data is that it is sanitized and normalized; however, without precise data it is a great source of directional insights. One challenge that most people will face with Google Search Console is that there is too much data, but here a few ways you can easily filter the data for actionable results.

  1. If you have built a solid brand, many of your bigger keywords will be branded. If your brand is somewhat complicated to spell, you will invariably have misspells. Use the filter below to put the primary words in your brand filter out most brand results from queries.

  2. In the current iteration of Google Search Console it can be difficult to find precise queries for a given page, here are the steps to dig up page level queries
    1. From the page report click into a specific popular page.
    2. Once that page is in the pages filter, click queries and you will see popular queries
  3. Find pages that rank for the same queries
    1. Search the keyword in the queries filter
    2. Then click into pages and you will see all pages that ever show up for phrases containing that query
  4. Find pages with queries that have a seemingly low CTR
    1. Click into a popular page
    2. Check off CTR in the metric portion of the page
    3. Look at the query list for that page to find queries with CTR’s that seem out of the standard range.

 

I will eventually expand on this post with other ways to use GSC so check back again!

Google Express
Uncategorized

Google Express is Now Free

Google Express originally called Google Shopping Express quietly ended their subscription pricing on August 23rd. Up until today, Google charged an annual subscription just like Amazon Prime. The Google Express news of the day was dominated by the announcement that Google and Walmart would be partnering on voice search to combat Amazon, but there was little attention paid to the change in their pricing model.

Here’s what the Google Express pricing page says now.

What happened to memberships?
The Google Express membership program ($10 for monthly members and $95 for annual members) has gone away in August 2017. If you have a membership, you’ll be refunded for the remaining time.

Why we’ve made the change:
We’ve recently welcomed new stores to broaden our product assortment. Many of these arrange for their own deliveries and returns, or have listed the same shipping fees and per-store minimums for everyone, regardless of membership status. Delivery speeds also vary, ranging from one day in some areas to one week or more with some specialty stores. Learn what stores are available on Google Express.

You’ll see free delivery when you meet the per-store minimum ($25 or $35 with most stores) – no membership required.

This is a really big deal since Google has significantly lowered a barrier to using their home delivery product and with the addition of Walmart allows them to beat Amazon on price and speed. I assume that Google will eventually have to make a bigger splash about their no membership required delivery product since this is just too big to bury in a quiet change.

Google advanced query
Uncategorized

The Top 9 Advanced Google Search Modifiers You Must Know

In 2016 Google told Search Engine Land that trillions of searches are conducted per year on its search engine and most people conduct dozens of searches per day. Google produced a great microsite which explains how Google search works, but in a nutshell, Google archives the entire web into a database and every search is a query into that database.

Every time a Google search is conducted, Google does its best to determine the intent behind the query, and then shows the most relevant matching results. Since Google’s search experience is just an algorithm, it may not always guess the right intent, so you want to give Google more clues about what you are looking for. Other times you might be looking for results within narrow parameters or on a specific type of site. To conduct these more specific searches on Google you must use search modifiers. Here are the top 10 search modifiers which can help you level up your Google queries:

 

  1. site: this query is very helpful when you are looking for results only on a specific website or type of website. Examples of this query: blog site:wikipedia.org means you only want to see results about blogs from Wikipedia.
  2. Another way of modifying the site query is to add a specific kind of web TLD (top level domain) like site:.gov. Example: blog site:.gov means you only want to see result about blogs from website that end with .gov
  3. Quotation marks around a query “example” means that the results must contain the word and not a related version of the word.
  4. A dash before a word means you only want results that DO NOT have words related to that word. Example website -blog means you don’t want to see anything related to the word blog.
  5. A plus sign before a word means that you want Google to take into account a word they may have discarded as a stop word. Example: +The long road home means that I want Google to consider the to be an equal part of my query to the other words.
  6. Adding the word OR between two queries means that you want results related to either the first word or the second word. Example: blog OR website  means you want to see results that are relevant to either query
  7. Similarly, the word AND between queries requires that results be related to both queries. Example: blog AND website  means that results have to be related to both blog and websites
  8. A tilde before a query means you want Google to also bring back results that are synonyms of your query. Example: ~blog  brings back results for all words that are synonyms of blog
  9. Filetype allows you to specify the type of result you want to bring back from Google. Example: filetype:.pdf means you only want to see results that are PDF pages

Don’t worry if you can’t remember all this since you can always conduct what Google calls an advanced search with the link hidden under the settings.

 

 

Uncategorized

Data Still Samples In Google Analytics Premium – But It’s Better Hidden. PSA

Anyone that has ever tried to do any analysis on large amounts of data in Google Analytics has likely been faced with the scourge of data sampling. In Google’s help sections they try to explain the rationale for data sampling by comparing it to trees in a forest.

In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acre

In reality visitors to a website are not like trees in a forest since each of them has different behavior and navigational paths. Even worse, many times users of GA will be faced with sampling of less than 1% which means that you aren’t even looking at 1 in 100 datapoints.

For some users that frequently deal with large datasets this sampling could be motivation enough to get Google Analytics 360, but as I recently discovered even in default reports data will still sample. (You can get around this by downloading the raw data.) Google does a far better job of hiding that data is sampled, so its not as obvious when working with sampled data.

See below on how you can recognize when you are working with sampled data.

Unsampled report in standard free GA

Unsampled GA report

 

Sampled report in free GA

Sampled GA report

Unsampled Report in GA 360 – notice the green check!Unsampled GA 360

Sampled Report in GA 360 – notice the orange check!Sampled GA 360 report

Uncategorized

Are soft 404’s Bad?

You may have seen errors in your Google Search Console about soft 404’s and wondered what they are. According to Google, a soft 404 is when the crawler discovers a page that Google believes to be an error page, but returns a 200 OK response header.

This is a very important point since not returning an actual 404 response header on an error page negatively impacts your crawl budget and it means that Google needs to spend time crawling pages that you do not want crawled.

Google illustrates this with an example that clarifies that regardless of what a page says, the response header is most important.

In addition to returning a 404 code in response to a request for a page that doesn’t exist, the server will also display a 404 page. This may be a standard “File Not Found” message, or it could be a custom page designed to provide the user with additional information. The content of the page is entirely unrelated to the HTTP response returned by the server. Just because a page displays a 404 File Not Found message doesn’t mean that it’s a 404 page. It’s like a giraffe wearing a name tag that says “dog.” Just because it says it’s a dog, doesn’t mean it’s actually a dog. Similarly, just because a page says 404, doesn’t mean it’s returning a 404.

Going back to the initial question: Are soft 404’s bad?  No, they won’t cause your site to be penalized, but you are causing Google to crawl your site in an inefficient manner. The best practice is to ensure that all error pages give a 404 or 410 response header.

Uncategorized

ISO country codes

Language family Language name Native name 639-1
Northwest Caucasian Abkhaz аҧсуа бызшәа, аҧсшәа ab
Afro-Asiatic Afar Afaraf aa
Indo-European Afrikaans Afrikaans af
Niger–Congo Akan Akan ak
Indo-European Albanian Shqip sq
Afro-Asiatic Amharic አማርኛ am
Afro-Asiatic Arabic
العربية
ar
Indo-European Aragonese aragonés an
Indo-European Armenian Հայերեն hy
Indo-European Assamese অসমীয়া as
Northeast Caucasian Avaric авар мацӀ, магӀарул мацӀ av
Indo-European Avestan avesta ae
Aymaran Aymara aymar aru ay
Turkic Azerbaijani azərbaycan dili az
Niger–Congo Bambara bamanankan bm
Turkic Bashkir башҡорт теле ba
Language isolate Basque euskara, euskera eu
Indo-European Belarusian беларуская мова be
Indo-European Bengali, Bangla বাংলা bn
Indo-European Bihari भोजपुरी bh
Creole Bislama Bislama bi
Indo-European Bosnian bosanski jezik bs
Indo-European Breton brezhoneg br
Indo-European Bulgarian български език bg
Sino-Tibetan Burmese ဗမာစာ my
Indo-European Catalan català ca
Austronesian Chamorro Chamoru ch
Northeast Caucasian Chechen нохчийн мотт ce
Niger–Congo Chichewa, Chewa, Nyanja chiCheŵa, chinyanja ny
Sino-Tibetan Chinese 中文 (Zhōngwén), 汉语, 漢語 zh
Turkic Chuvash чӑваш чӗлхи cv
Indo-European Cornish Kernewek kw
Indo-European Corsican corsu, lingua corsa co
Algonquian Cree ᓀᐦᐃᔭᐍᐏᐣ cr
Indo-European Croatian hrvatski jezik hr
Indo-European Czech čeština, český jazyk cs
Indo-European Danish dansk da
Indo-European Divehi, Dhivehi, Maldivian
ދިވެހި
dv
Indo-European Dutch Nederlands, Vlaams nl
Sino-Tibetan Dzongkha རྫོང་ཁ dz
Indo-European English English en
Constructed Esperanto Esperanto eo
Uralic Estonian eesti, eesti keel et
Niger–Congo Ewe Eʋegbe ee
Indo-European Faroese føroyskt fo
Austronesian Fijian vosa Vakaviti fj
Uralic Finnish suomi, suomen kieli fi
Indo-European French français, langue française fr
Niger–Congo Fula, Fulah, Pulaar, Pular Fulfulde, Pulaar, Pular ff
Indo-European Galician galego gl
South Caucasian Georgian ქართული ka
Indo-European German Deutsch de
Indo-European Greek (modern) ελληνικά el
Tupian Guaraní Avañe’ẽ gn
Indo-European Gujarati ગુજરાતી gu
Creole Haitian, Haitian Creole Kreyòl ayisyen ht
Afro-Asiatic Hausa
(Hausa) هَوُسَ
ha
Afro-Asiatic Hebrew (modern)
עברית
he
Niger–Congo Herero Otjiherero hz
Indo-European Hindi हिन्दी, हिंदी hi
Austronesian Hiri Motu Hiri Motu ho
Uralic Hungarian magyar hu
Constructed Interlingua Interlingua ia
Austronesian Indonesian Bahasa Indonesia id
Constructed Interlingue Originally called Occidental; then Interlingue after WWII ie
Indo-European Irish Gaeilge ga
Niger–Congo Igbo Asụsụ Igbo ig
Eskimo–Aleut Inupiaq Iñupiaq, Iñupiatun ik
Constructed Ido Ido io
Indo-European Icelandic Íslenska is
Indo-European Italian italiano it
Eskimo–Aleut Inuktitut ᐃᓄᒃᑎᑐᑦ iu
Japonic Japanese 日本語 (にほんご) ja
Austronesian Javanese basa Jawa jv
Eskimo–Aleut Kalaallisut, Greenlandic kalaallisut, kalaallit oqaasii kl
Dravidian Kannada ಕನ್ನಡ kn
Nilo-Saharan Kanuri Kanuri kr
Indo-European Kashmiri कश्मीरी, كشميري‎ ks
Turkic Kazakh қазақ тілі kk
Austroasiatic Khmer ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ km
Niger–Congo Kikuyu, Gikuyu Gĩkũyũ ki
Niger–Congo Kinyarwanda Ikinyarwanda rw
Turkic Kyrgyz Кыргызча, Кыргыз тили ky
Uralic Komi коми кыв kv
Niger–Congo Kongo Kikongo kg
Koreanic Korean 한국어, 조선어 ko
Indo-European Kurdish Kurdî, كوردی‎ ku
Niger–Congo Kwanyama, Kuanyama Kuanyama kj
Indo-European Latin latine, lingua latina la
Indo-European Luxembourgish, Letzeburgesch Lëtzebuergesch lb
Niger–Congo Ganda Luganda lg
Indo-European Limburgish, Limburgan, Limburger Limburgs li
Niger–Congo Lingala Lingála ln
Tai–Kadai Lao ພາສາລາວ lo
Indo-European Lithuanian lietuvių kalba lt
Niger–Congo Luba-Katanga Tshiluba lu
Indo-European Latvian latviešu valoda lv
Indo-European Manx Gaelg, Gailck gv
Indo-European Macedonian македонски јазик mk
Austronesian Malagasy fiteny malagasy mg
Austronesian Malay bahasa Melayu, بهاس ملايو‎ ms
Dravidian Malayalam മലയാളം ml
Afro-Asiatic Maltese Malti mt
Austronesian Māori te reo Māori mi
Indo-European Marathi (Marāṭhī) मराठी mr
Austronesian Marshallese Kajin M̧ajeļ mh
Mongolic Mongolian Монгол хэл mn
Austronesian Nauru Ekakairũ Naoero na
Dené–Yeniseian Navajo, Navaho Diné bizaad nv
Niger–Congo Northern Ndebele isiNdebele nd
Indo-European Nepali नेपाली ne
Niger–Congo Ndonga Owambo ng
Indo-European Norwegian Bokmål Norsk bokmål nb
Indo-European Norwegian Nynorsk Norsk nynorsk nn
Indo-European Norwegian Norsk no
Sino-Tibetan Nuosu ꆈꌠ꒿ Nuosuhxop ii
Niger–Congo Southern Ndebele isiNdebele nr
Indo-European Occitan occitan, lenga d’òc oc
Algonquian Ojibwe, Ojibwa ᐊᓂᔑᓈᐯᒧᐎᓐ oj
Indo-European Old Church SlavonicChurch SlavonicOld Bulgarian ѩзыкъ словѣньскъ cu
Afro-Asiatic Oromo Afaan Oromoo om
Indo-European Oriya ଓଡ଼ିଆ or
Indo-European Ossetian, Ossetic ирон æвзаг os
Indo-European Panjabi, Punjabi ਪੰਜਾਬੀ, پنجابی‎ pa
Indo-European Pāli पाऴि pi
Indo-European Persian (Farsi)
فارسی
fa
Indo-European Polish język polski, polszczyzna pl
Indo-European Pashto, Pushto
پښتو
ps
Indo-European Portuguese português pt
Quechuan Quechua Runa Simi, Kichwa qu
Indo-European Romansh rumantsch grischun rm
Niger–Congo Kirundi Ikirundi rn
Indo-European Romanian limba română ro
Sino-Tibetan Rothongua 荣同话 rh
Indo-European Russian Русский ru
Indo-European Sanskrit (Saṁskṛta) संस्कृतम् sa
Indo-European Sardinian sardu sc
Indo-European Sindhi सिन्धी, سنڌي، سندھی‎ sd
Uralic Northern Sami Davvisámegiella se
Austronesian Samoan gagana fa’a Samoa sm
Creole Sango yângâ tî sängö sg
Indo-European Serbian српски језик sr
Indo-European Scottish Gaelic, Gaelic Gàidhlig gd
Niger–Congo Shona chiShona sn
Indo-European Sinhala, Sinhalese සිංහල si
Indo-European Slovak slovenčina, slovenský jazyk sk
Indo-European Slovene slovenski jezik, slovenščina sl
Afro-Asiatic Somali Soomaaliga, af Soomaali so
Niger–Congo Southern Sotho Sesotho st
Indo-European Spanish español es
Austronesian Sundanese Basa Sunda su
Niger–Congo Swahili Kiswahili sw
Niger–Congo Swati SiSwati ss
Indo-European Swedish svenska sv
Dravidian Tamil தமிழ் ta
Dravidian Telugu తెలుగు te
Indo-European Tajik тоҷикӣ, toçikī, تاجیکی‎ tg
Tai–Kadai Thai ไทย th
Afro-Asiatic Tigrinya ትግርኛ ti
Sino-Tibetan Tibetan Standard, Tibetan, Central བོད་ཡིག bo
Turkic Turkmen Türkmen, Түркмен tk
Austronesian Tagalog Wikang Tagalog, ᜏᜒᜃᜅ᜔ ᜆᜄᜎᜓᜄ᜔ tl
Niger–Congo Tswana Setswana tn
Austronesian Tonga (Tonga Islands) faka Tonga to
Turkic Turkish Türkçe tr
Niger–Congo Tsonga Xitsonga ts
Turkic Tatar татар теле, tatar tele tt
Niger–Congo Twi Twi tw
Austronesian Tahitian Reo Tahiti ty
Turkic Uyghur ئۇيغۇرچە‎, Uyghurche ug
Indo-European Ukrainian українська мова uk
Indo-European Urdu
اردو
ur
Turkic Uzbek Oʻzbek, Ўзбек, أۇزبېك‎ uz
Niger–Congo Venda Tshivenḓa ve
Austroasiatic Vietnamese Tiếng Việt vi
Constructed Volapük Volapük vo
Indo-European Walloon walon wa
Indo-European Welsh Cymraeg cy
Niger–Congo Wolof Wollof wo
Indo-European Western Frisian Frysk fy
Niger–Congo Xhosa isiXhosa xh
Indo-European Yiddish
ייִדיש
yi
Niger–Congo Yoruba Yorùbá yo
Tai–Kadai Zhuang, Chuang Saɯ cueŋƅ, Saw cuengh za
Niger–Congo Zulu isiZulu zu
Uncategorized

Don’t Sell Soccer Balls When Customers Play Football

There are enough marketing slogans and clichés to fill the hundreds of business books produced every year. Most of them you can disregard and just focus on what works for your particular business and industry. However, if you conduct any commerce across borders here’s a strategy you can’t afford to ignore: “Don’t sell soccer balls when your customers play football.”

What this simply means is, name and describe your products exactly the way your customers would refer to them. If you make soccer balls, no international customer would ever find you, even if you make the world’s best soccer ball, unless you refer to your soccer products with the word “football”, as most international countries call the sport.

It makes no difference whether you are marketing online or offline; with organic search or paid media. If you don’t use the language of your customers you can’t possibly access them or sell to them.

For example, Urinal is a product that promotes urinary tract health sold in the Czech Republic. While this product might sell well in its home market, it will be very challenging to market it in most English-speaking markets without renaming the product something a bit more benign.

 

Handy, the German slang for mobile phones is another example where proper international naming is crucial. In English-speaking countries handy just means useful. Without some naming research, an English-speaking product manager or content marketer might not realize that handy is a word that they should include in any German targeted mobile phone product descriptions.

 

So how can you make sure you that you are using the right words when targeting an international audience? Here’s five ways, which are easy and very inexpensive:

1. Mechanical Turk

Post an image of your product as a “HIT” on Mechanical Turk targeted to your focus country/language and ask people to describe the image in a few words. Experiment with per HIT pricing, but you can get quality results for less than $10.

2. Freelancers

Hire two native speaking contractors on a freelance site like Odesk/Freelancer/Craigslist and ask them each to describe your product. The descriptions should be fairly similar, so a merge of both descriptions will give you the best words to describe your items.

3. Adwords Test Campaign

Run an Adwords campaign in your target country for your product using all possible keywords as exact match types. For ad copy, create different product names as headlines, but leave the rest of the ad the same across all variations. Allow Adwords to evenly rotate the ads. Once you have achieved some sort of significance on impressions, the ad copy with the most clicks will likely be your best product name.

4. Adwords Display Planner

Plug your first keyword into the Adwords Display Planner and choose only your target country and then click, get placement ideas. On the next screen, click “placements” under the individual targeting tabs. Check out the website ideas and see if they are related to your products. Repeat this search for all of your word possibilities until you find the competitive set of sites that are most related to your products.

5. Alibaba
Search your keyword possibilities on Alibaba.com, China’s, and possibly the world’s, leading eCommerce site. You can see all products listed on Alibaba from the same search box; for example an English search will also bring up German language listings. You can audit the listings to get a sense of the most common ways of describing your products. You can also try this on Ebay, but you will need to use the specific Ebay domain for your targeted country. (E.g. ebay.de for Germany).

 

If you have any more ideas or tools on how to find the best ways to target a global audience with the most optimum product names and descriptions, please do share.

 

 

Uncategorized

Four Ways To Check International Rankings for Free

There are countless posts published on SEO blogs declaring search engine rank checking is dead, and ranking reports should no longer be shared as a KPI. While I wholeheartedly agree that rank tracking should not be the primary metric one uses to determine SEO success or failure, rankings reports still play an important part in the role of an SEO.

In my own role, I no longer see the benefit of frequent rank checking on a mass scale, but I do conduct many manual queries to better understand who is ranking on some of my favorite keywords. In this post, I will share four ways I get accurate international rankings for free.

Benefits of Rank Checking

Obviously, the goal in creating any piece of web content–provided of course that it is exposed to search engines–is to generate organic search traffic. Without checking rankings, there is no way to be certain that any piece has been correctly targeted for the desired terms. Certainly, you can look at organic traffic in your analytics software, but with Google not sharing keyword data you wouldn’t know whether the traffic is coming from the intended terms. Webmaster Tools will tell you some of the story, but the keyword report often lags and is subject to account personalization.

Automated and manual rank checking in the US is very simple as you can type your queries into a search box or use a variety of software solutions. The real rank checking challenge is to understand how you are doing on non-US Google searches.  Discovering how you rank in the UK is not nearly as simple as going to google.co.uk from an incognito window. Even though your personalized data is not included in Google’s query processing to understand the query intent, you are still physically located in the US (or whatever country) and this is going to bias the results away from what an actual in-country searcher would see.

Rank Checking for International SEO

Additionally, international SEO presents some very complex challenges for someone who does not know the language they might be targeting in an SEO campaign, so knowing precise search engine rankings becomes even more important.  You won’t be as familiar with your list of target keywords as you are with the keywords in your own language, and you won’t have as strong of a grasp of the keyword modifiers and synonyms that you should also be targeting.

Much like domestic SEO, looking into your analytics software to see how much traffic you are receiving is not going to be that helpful.  You can, for example, see traffic is increasing in a target country, but you won’t have very much insight into whether branded or non-brand queries are driving the traffic. Also, if you are in the beginning stages of an international campaign and just need to prove the value of a new piece of non-English content, you will not have the data you need to prove a desired ROI.

Four ways to check international rankings for nearly free

Luckily, there are a few ways to check Google rankings for free or almost free that will show you search results just like any in-country user.

  •  Adwords preview tool.  Although this tool is designed to show if your ad is currently appearing for a specific query term, as with most of Google’s paid search tools, there is an SEO use. The tool allows you to choose the specific Google TLD, country, and city you are targeting. You can see how rankings differ on Google.CA for a specific query in Toronto, ON or Montreal, QB.  For added fun, you can see what might be ranking on Google.co.uk for the same Canadian locations and notice how the rankings might change slightly. You can also choose between desktop and mobile search. These results are completely generic with no personalization and would be very similar to what a user in your target country would see. 

    Adwords Preview Tool

    Adwords Preview Tool screenshot 4/29/2014

     

  • Append parameters to your search query string. Search using your targeted Google TLD (e.g. Google.at for Austria) and then append parameters onto your Google query URL. The query URL tells Google what language interface you are you are using and the physical location of the user.  Here’s an example query string: https://www.google.pt|/search?q=wufoo|&gl=GB|&hl=es&
    Google Search

    Google Parameters in Search Screenshot 4/29/2014

    The first part https://www.google.pt shows that you are conducting a search on Google.pt – Google’s Portuguese TLD. The next section “search?q=” is your actual query. After that is where you would append “gl” which is your Google location. Google uses the two-letter ISO country code for this parameter. (Find the full list here). In my query, I am searching in the UK, which uses the ISO code of “GB.”

    Lastly, “hl=” is where you can append the interface language of your search. This parameter uses the two letter ISO language codes. (Find the full list here.) In my query just to mix things up, I am using Spanish that has the language code of “es.” The interface parameter should match a language of the country you are targeting as the results do change by interface. If you do not add an interface code, the default will be the interface of the Google TLD where you are conducting the query. To ensure that you are seeing the results as the actual user that you are targeting, it is helpful to change the interface language in countries where there are multiple languages as there are in Canada, Switzerland, and many other countries.

  • Browser Based Proxy search. Use a proxy plugin like FoxyProxy on either Firefox or Chrome and use public in-country proxies in the plugin. There are many free public proxies you can use, but many of them will be slow and unreliable. For a few dollars per month, you can subscribe to a proxy service in your target countries and gain access to proxies that are less likely to be on Google’s blacklist. Once you have set up your proxy, use an incognito window to check your IP address location to ensure you are indeed accessing the Internet via your proxy. Once you have confirmed that you are behind a proxy, conduct manual Google searches via the correct Google TLD for your target country. (Ideally, Google should redirect you to the local Google TLD for your proxy country, but it doesn’t hurt to just go there directly.)
  • Access the web via a proxy. Subscribe to an enterprise proxy service that gives you multiple IP addresses for your target country, and run the proxy via your network settings on your computer. This will put all Internet traffic on your computer behind the proxy IP, and you can then run automated ranking tools like Rank Tracker, Authority Labs, and Advanced Web Rankings.

If any of the above methods are too time consuming for you, you can always subscribe to the many paid web-based ranking tools and receive monthly, weekly, or daily reports. The cost will increase based on how many countries, search engines, and report frequency you need. Even with the web-based ranking tools, you may still have to go the manual route just to confirm or screenshot any of the rankings you are seeing in your rankings reports.

I hope that these rank checking methods help with your international SEO efforts, and I look forward to hearing additional ideas in the comments.

 

Uncategorized

Multilingual SEO: It’s Actually a Pretty Big Challenge for Google to Determine the Language of a Query

Originally published at SearchEngineJournal

There are many words which are spelled the same but have different meanings based on language and location.  A very simple example is the word “football”. In the US and Canada refers to a game played with a ball that is thrown in the air and carried towards a goal; while, in the UK and Australia it refers to a game that is played by kicking a ball into a goal (also known as ‘soccer’ to Americans). So, how does Google determine which meaning of a specific word a user is after?

Query Challenge

Every time someone conducts one of these ambiguous searches on Google, Google’s algorithm immediately needs to figure out the preferred language of the user to just understand the category of results that should be returned before even determining the rankings of those results.

While the word football is spelled the same by all English speakers, a human audience would not know which type of game is being referenced in a conversation unless they knew where the person talking about the game came from. In both games, there are similar features like a great deal of running, passing, and even goal kicking.

 

Screenshot of Google.com vs Google.com.mx. 5-30-14

Screenshot of Google.com vs Google.com.mx. 5-30-14

Spoken Advantages

Within a very short spoken conversation or statement there would probably not even be any semantic clues that could help the listener figure out which kind of football was being referenced.  If someone just asked, “What time is the football game?” or “Do you play football?”, the answer would be dependent on the specific kind of football. (When listening to ambiguous phrases, there may be the prevalence of an accent, but this advantage will not exist for typed phrases in a search box.) However, if the conversation is expanded the listener will eventually be able to figure out whether the primary topic is American football or soccer.

Similar to spoken conversation, in longer queries, Google will also use adjoining words to the ambiguous term to help refine the query. A query like “football pitch” would mean that a user is looking for soccer, and “football field goal” would mean that it is an American (or Canadian) football query. Furthermore, Google uses additional query words combined with timing to understand the query. “What time is the football game?” searched on an NFL game day Sunday would be a great indicator of the query intent of the user.

One Word Query

When the query is just one word, this becomes far more challenging. Figuring out which kind of sport a user is seeking is certainly a challenge, but at least both variations are referring to a game.  Google could just return results for both definitions of football, but that would not be a very good user experience. An American seeking the NFL would not understand why there are results for soccer in the search page.

Google is able to get away with returning different categories of results in ambiguous queries like “breadcrumbs” because a user understands that Breadcrumbs could have multiple meanings. In the screenshot below, Google is returning results for recipes, the breadcrumb design element, a product, and a book. All of these make sense, and there is no sense that Google failed to interpret the query.  Adding a result from another culture or language is a lot more jarring.

Breadcrumb Query on Google

Google search for Breadcrumbs Screenshot 5-30

This is an even greater challenge for the dozens of examples where a word means one thing in a language, but has a different meaning entirely in another language.  In English, a “gift” is something nice you give to people, while in German, a gift is poison. In France, “pain” is bread, while in English, it is something we try very hard to avoid. (For some off-color examples, have a look at this Reddit thread.)

Language Prioritization for User Experience

If Google were to return results across multiple languages, the user would probably think there was something wrong with Google and use another search engine. It is even more important in these cases that Google correctly determines the user’s preferred language and returns only relevant results.

If there are other words that accompany the multi-use word, Google can use these to match the user’s language and return the best result. As before, the real challenge is when there is only a one-word query.

To try to parse the user’s language, Google is going to heavily rely on all of the user’s past history with search and most of the time this will be all they need. A user that usually searches in English will most likely want an English result.  A query for “football” that comes fairly close to a query for “Steelers” would be a strong indication that the user is not interested in soccer results.  Going even deeper into the full user history a user that clicked on World Cup results in the past would probably be interested in Soccer results. For those that are fans of conspiracy theories, Google could potentially use data like previous history of watching sports videos on YouTube or time spent on sports site with Doubleclick retargeting pixels to give them a more complete picture of the user.  (See Google’s ad preferences [Canadian link]  for what they know about your individual activities)

Five Levers to Determine Language Preferences

Nonetheless, even with all the data they have gathered on users there will be many instances where past history will not help. For these instances, Google looks at five different areas to help them determine how they interpret the query.  (An Adwords support page claims to only use user settings at least for Adwords, but other language ads will more than likely accompany whatever language they determine to be the query.)

User Account Preferences

If the user has an account with Google, at the time they setup the account they were either forced to choose a language and location in the sign-up process or they were defaulted into one.  If a user’s settings declare their preferences to be English, and US, Google will first assume that the likely language of any query will be American English. These preferences also populate the default search preferences, which can be found under search settings on a Google search page.

If a Google account user decided they wanted to start seeing results in another language or locale they would need to manually change their language preferences. These can be changed just for search under the search settings options or for all Google products under the account settings. Changing language and location preferences will impact anywhere a user conducts logged in searches including other computers and mobile devices.

Browser Settings

Since not all Internet users have Google accounts or always logged-in, if they are Google account holders, Google’s first backup for account level language settings is a similar setting at the browser level. In all modern browsers, there is a default setting which declares a user’s language preferences. Google will use a browser’s location and location preference as the primary clue for a user’s language intent.

In most cases, the language setting is defaulted to how the user installed the browser. If the browser was downloaded in English from a US mirror, it will probably be set to English and US.

For Chrome and Firefox, these settings can be adjusted at the browser level, however, to change settings for IE and Safari, this actually needs to be done at the system level – a pretty big change to just do some Google testing.

Chrome language preferences

Chrome language preferences Screenshot on 5-30-14

Geolocation

Often times, just relying on either Google account or browser settings doesn’t give Google’s algorithm complete confidence in the desired language of a query. To add a higher degree of certainty, they will see where the user is physically located.

Generally, Google relies on physical locations of a user a great deal in order to better target search results.  A user in the US that searches for “Giants” on the East Coast of the United States will see more New York Giants results on the first Google results  – even during the NFL off-season, while a West Coast user will see more San Francisco Giants results – even during the MLB off-season.

For many queries, there won’t be a great degree of difference in the search results conducted on Google.com from various locations, but there will be some queries that see some major shifts. For example, a query for the word “football” will be nearly identical in the US, Canada, and the UK; while, a query for the word “holiday” will be very different in the UK than it is from the US.

TLD of Google Domain

While physical location is an important clue for a user’s language intent, it will very rarely override any of the account or browser level language settings.  However, the Google TLD (e.g. Google.com vs Google.co.uk) where the query was conducted can override these settings.

 

Google.com.br

Google.com.br screenshot 5-30

Typically, a logged-in user will default to Google.com even if they are traveling outside the US. A non-logged-in user will get redirected to whatever the local Google TLD is even if their browser settings indicate that they prefer English and US.

TLD is a very important factor in determining in what language to return results, and if there was a hierarchy in Google’s language determination processing, it could either be first or simply go hand-in-hand with location targeting.  The TLD can one of the best clues Google has for language intent if the user intentionally chose to the specific TLD.

For example, a user in the US who conducted a search on Google.com.br very likely would like to see Portuguese results. On the other hand, it can be a poor clue if the user was simply directed to that TLD by their location as a traveler might have been. In the traveler example, an US resident traveling in Germany that conducted a Google search while logged-out from their account would see Google.de by default simply because of their location. Google relying on the TLD as a determinant of their language intent might end up giving the user poor results.

If this user searched the word “handy” they would see results related to mobile phones because this is what Germans use to refer to a cell phone. The user might very well have been interested in the types of results that Google would have shown in the US, but did not get to see them because of an incorrect language choice.

When Google uses TLD for language assumptions, they always default to the primary language of a country. In Canada where both English and French are official languages, a query for the word “baguette” would return English results even though it is technically a French word. The same defaults would be occur in Switzerland where even though German, French, and Italian are widely spoken, Google always assumes that a query is in German whenever there is any doubt.

Query Parsing and Matching

Lastly, Google tries to break down the word itself looking for any clues as to the language. The algorithm matches the word itself against word matches in the most common languages. Once a language is matched via a keyword, all results will most likely be in that specific language. This is fairly simple when the word is spelled correctly and only matches a single popular language. It is a bit more complicated when it is not a perfect match.

In these cases, Google will look for things like statistical matches towards a misspelling in a specific language versus another. The word “football” can be spelled “futbal” “futbol” and “futball”, so Google will try to guess using all the rest of the rest clues to determine if the user made a spelling mistake or whether results in another language were actually sought. For any technically minded readers, more details about this process can be gleaned from Google’spatent on the topic.

TLDR

SEO’s typically focus on the aspects of Google’s algorithm that decide in what position a webpage should be ranked. In reality, Google’s algorithm is far more complex than an ordering of content based on scores. They actually need to conduct a real-time analysis on every query to determine the user’s language before they can even start retrieving sites from the index and determining the ranking for each of these pages.

I hope this brief look into how Google determines a queries language gave you some interesting food for thought on how hard Google works to satisfy a user and provide a high level of quality in their results. I have not found any Google source which shares how they determine ranking, and the findings above came from my own research. If you have discovered or just know something different, I would love to hear more about it.

 

Featured image via Flickr

1 2