Statistics can often be misleading. That’s not to say statistics can’t be trusted, valid or even true; it’s just that there are many, many factors that can impact the veracity of a statistic. These factors include human error, what questions are being answered, how information is gathered, known biases, unknown biases, what filter is being used to frame the story, among other factors.
So, why should we care about the validity of a statistic? Because one simple number, one simple statistic, can distort the story or change the context of the tale.
“There are three kinds of lies; lies, damned lies, and statistics.”
— Mark Twain
Take, for instance, the ubiquitous ‘average.’ This statistical measure is used to compare and benchmark a set of data. In North America, it’s used to qualify income earned, benchmark the price of items, it’s even used to qualify a trend in costs when it comes to items, such as gasoline or property. But using the statistical average can actually lead to major distortions in the way we view data, especially when it comes to the real estate market.
Believe it or not, the term ‘average’ is actually a set of statistics known as the central tendency — the measure of the middle, or centre point, of a data set. (More on this in a bit).
In common use, however, the term ‘average’ typically refers to only one of these central tendency measures: the arithmetic mean. This measure is calculated by taking a set of data, adding all of it together and then dividing that total by the number of elements in the data set. This includes the measurement for average housing prices.
Why use averages to tell complex stories?
Averages are designed to be a single measurement taken across a diverse group of samples. The point of using averages is to get a central value of a dataset.
Averages are used because they are an easy-to-understand and easy-to-grasp concept. These calculations find the middle of a data set. The goal of the average is to find the representative sample; the one value that could replace and represent the data.
When working with large amounts of data, this middle or ‘balanced’ representation can be very helpful. But the value of that representative calculation really depends on how the data set items interact with one another. You may be surprised to learn that there isn’t one “average” but rather five central tendency measures (aka: averages) that statisticians use:
- Arithmetic mean (aka: mean): This is the measure most people refer to when they use the term “average.” Mean is calculated by taking a set of data, adding all of it together and then dividing that sum by number of elements in the data set.
- For example: 3+4+5+3+6+7+2+2 = 32/8 = 4
- As the representative value, you could then replace all values in the data set and still have the same result: 4+4+4+4+4+4+4+4 = 32/8 =4
- A great visual and description of mean can be found here.
- Median: Is the measure of the middle. It takes the entire, sorted data set and finds the exact middle value. If there are two middles, because the data set total is an even number, then you take the arithmetic mean, or average, of these two values to find the median.
- A great visual and description of median can be found here.
- Mode: Is the element that occurs most often in a list. Though rarely used, mode is great for finding the popular number or element in a data set.
- A great visual and description of mode can be found here.
- Geometric mean: This measure also uses the set of data, but rather than adding it together, this measure multiplies the numbers together and then takes the square root (for two numbers), the cubed root (for three numbers), and so on. It’s also known as the average factor.
- Harmonic mean: A type of average that is calculated by dividing the number of values in the data series by the sum of reciprocals (1/x_i) of each value in the data series. It’s best for showing a rate or ratio, which is why it’s also known as the average rate.
Confused? Understandably. Over time the term “average” has become synonymous with only one central tendency value: the arithmetic mean. The reason is because the arithmetic mean is simple to grasp and offers a quick snapshot (the statistical centre) of a large number of data points. (Calling the arithmetic mean, the “average” is mathematically incorrect, but let’s move on.)
Why are averages often wrong or misleading?
Turns out, using the arithmetic mean can be misleading, especially when used for larger data sets and particularly when a data set includes outliers. But this doesn’t mean there is a simple, straight-forward answer as to how using an average can distort the context of a story. It’s not as if the mathematical formula is incorrect, but to better appreciate the drawbacks of each average measure, let’s examine three common reasons why using averages can be misleading.
1. Data set outliers can distort the average
It’s not uncommon for any type of data set to have a few outliers.
This is most easily seen on a graph, where the majority of the data points are clustered around one area, but a few end up floating either much higher or significantly lower than the rest. These data points are known as data set outliers.
The issue with outliers is that they can skew or “pull” the average housing price in their direction.
Take, for example, the sale price of homes that sold last month in a given neighbourhood. Based on sales data we know that five homes sold and that the sale prices were, as follows:
Based on the sale prices of these five properties, the average housing price for this neighbourhood is just a smidge over $240,000.
Already, you can see the problem. The majority (80%) of the homes sold last month were well under the mathematical accurate average sale price. Yet, because of the last home, which sold at a much higher price point than the other homes, the average sale price was significantly higher.
Why does this matter? If you were selling a home in this area and learned that the average sale price was $240,000, you would probably end up very disappointed if your home got an offer that was less than this — and, yet, if your home was consistent with the other homes that sold in this area, you probably would end up with an offer that was less than the mathematical average.
2. We assume the average is typical
Another way that averages tend to be misleading is when they are thought of as ‘typical.’
Typical, by definition, is the expectation that something or someone has the qualities or characteristics expected of that person, place or thing. It’s a short-cut method that allows us to benchmark and make quick assumptions. But there’s a danger to making this assumption.
For instance, if you were told that 66% of Canadian citizens owned their own home in 2018, then, you’d assume that roughly two-thirds of the population owns a home and the other third doesn’t.
But when you start digging a bit deeper, you find that this statistic varies wildly depending on a number of other factors such as age, race, profession, yearly income, etc. Even worse, every group within this larger data set can be categorized or divided into a multitude of different ways. The result is varying rates of home-ownership across the country, depending entirely on which categories are used to describe the average.
3. Single group data mistaken for individual representation
Finally, averages can be misleading when they are taken from a larger population grouping and applied directly to an individual scenario.
As Eric Luellen, an award-winning innovator and data-science strategist, writes, “Even assuming data is normally distributed (a “bell curve”), the probability that any one data point will be the same as the average is 50% — the same as a random guess.”
That means whenever you apply the average — average income used to buy the average-priced home — your chances of being accurate in your assessment are 50/50, at best. When you break it down to an individual level, you’ll likely find that this average does not accurately describe the more common costs and earnings in our lives, such as the monthly mortgage you pay, your monthly income or even your household expenses.
When to use each measurement of average
Given that there are five measures of central tendency, why do we use ‘average’ (aka: central tendency). To understand, we need to examine what each average statistic does and what it is best used for. To help, here is a cheat sheet chart:
In real estate, you’ll typically see three measurements:
- Median price: Take all of the sales within a given time period, say one month, put these sales in a list ranked from the lowest price to the highest one. The middle value of this list is the median price — the exact midpoint between all real estate sales in that time period.
- Average price: Take all the sales (in a certain period of time), add them up and then divide the total sum by the number of sales. This is the average, or representative, sale price during this time frame. Keep in mind, this number can become skewed if lower-priced homes or higher priced properties have been sold within this certain time period.
- Benchmark price: The predicted, or typical, sale price for a specific area and property type based on historical sales data in the same area and for the sale property type. The benchmark price does not take the high or low-priced outliers into account.
What other options do we have to tell a story?
Given the difficulties with mean and the limits of using other average measurements are there other methods to calculate have been developed over time.
For instance, Statistics Canada produces the New Housing Price Index every month. The Canadian Real Estate Association also calculates a monthly resale house price index, based on reported sale prices submitted by real estate agents, and averaged by region. Finally, the commercial banker, National Bank partnered with technology firm Teranet, in December 2008, to create a new monthly house price index based on resale prices of individual single-family houses in selected metropolitan areas. The Teranet housing price index uses a methodology similar to the U.S.-based Case-Shiller index and based on actual sale prices taken from government land registry databases.
In all cases, these price indexes provide a typical or representative number that helps illustrate what to expect in that market. The big drawback? HPI and benchmark prices are usually only calculated for large real estate markets, such as cities, provinces and countries. While this is changing — more and more real estate boards are adopting and publishing benchmark prices for neighbourhoods — it still means relying on data that lags the current market conditions, since HPI is typically only published at month-end.
Why use mean (aka: average) when discussing real estate?
Both the average sale price and median sale price can be valuable when trying to assess the sales activity of a given real estate market. These statistics both provide a snapshot of what is typical in the given neighbourhood, for that given property type in that specific timeframe. Yet, while statistical measures have their benefits they also have their drawbacks. For that reason, it’s best if buyers and sellers use both measures. You heard me: Use both the mean (aka: average) and median sale price when buying or selling a home. By examining both statistical measures, and using all the other information for comparison at your disposal, you will make the best decision when it comes to buying or selling an individual property.