Anyone that has ever tried to do any analysis on large amounts of data in Google Analytics has likely been faced with the scourge of data sampling. In Google’s help sections they try to explain the rationale for data sampling by comparing it to trees in a forest.
In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set. For example, if you wanted to estimate the number of trees in a 100-acre area where the distribution of trees was fairly uniform, you could count the number of trees in 1 acre and multiply by 100, or count the trees in a half acre and multiply by 200 to get an accurate representation of the entire 100 acre
In reality visitors to a website are not like trees in a forest since each of them has different behavior and navigational paths. Even worse, many times users of GA will be faced with sampling of less than 1% which means that you aren’t even looking at 1 in 100 datapoints.
For some users that frequently deal with large datasets this sampling could be motivation enough to get Google Analytics 360, but as I recently discovered even in default reports data will still sample. (You can get around this by downloading the raw data.) Google does a far better job of hiding that data is sampled, so its not as obvious when working with sampled data.
See below on how you can recognize when you are working with sampled data.
Unsampled report in standard free GA
Sampled report in free GA
Unsampled Report in GA 360 – notice the green check!
Sampled Report in GA 360 – notice the orange check!