Eli Schwartz

Data Still Samples In Google Analytics Premium – But It’s Better Hidden. PSA – 📌Eli Schwartz

Anyone that has ever tried to do any analysis on large amounts of data in Google Analytics has likely been faced with the scourge of data sampling. In Google’s help sections they try to explain the rationale for data sampling by comparing it to trees in a forest.

In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acre

In reality visitors to a website are not like trees in a forest since each of them has different behavior and navigational paths. Even worse, many times users of GA will be faced with sampling of less than 1% which means that you aren’t even looking at 1 in 100 datapoints.

For some users that frequently deal with large datasets this sampling could be motivation enough to get Google Analytics 360, but as I recently discovered even in default reports data will still sample. (You can get around this by downloading the raw data.) Google does a far better job of hiding that data is sampled, so its not as obvious when working with sampled data.

See below on how you can recognize when you are working with sampled data.

Unsampled report in standard free GA

Unsampled GA report


Sampled report in free GA

Sampled GA report

Unsampled Report in GA 360 – notice the green check!Unsampled GA 360

Sampled Report in GA 360 – notice the orange check!Sampled GA 360 report