Feb 3, 2022 3 min read phân tích dữ liệu

Why my numbers don’t match!? A step-by-step guide to uncover

A step-by-step guide to address stakeholders’ concerns when they see a discrepancy in dashboards — why do they get different numbers even though they think the dashboards measure the same thing?

As a Data Scientist supporting decision making, I have been working on many exciting tasks: meeting with stakeholders to help them refine their questions, defining success metrics for a new product launch, data pipelining, creating new dashboards to track metrics, or working on statistical analysis to check on hypotheses.

However, there is one task that I generally find less exciting to do because it can be very tedious, though very important to gain and maintain trust from the business to our data. It is to answer stakeholders why numbers from different dashboards don’t match.

I have done this task a fair amount across companies I’ve worked for. And I figure out a “formula” that works very well for me to tackle this task. Regardless of what tool you are using for your dashboard, this formula can be applied.

Step 1: Clarify

When stakeholders come to you raising their concerns, don’t jump into solutioning right away!

Ask these questions. They will help you understand the scope of the problem, and prioritize your work.

Which dashboard? (ask them for the link)
If needed, understand the context of their ask. How big is the difference? What decision the dashboard can help them to make?
Given the difference (hopefully small), can they go ahead and make a decision using either of them?
If they can’t, and you need to investigate, when do they need to get the result?

Step 2: Investigation

Now comes the tedious part. Check on these 3 things to find out why results are different.

1. Data source

Do your dashboards use similar or different data sources?

Do they have the same period of time?
If these sources are called differently, do they refer to the same upstream source?
Do they have the same grain? (eg. one customer per row, or one sale per row and each customer can have multiple sales)
Does the freshness of data meet your expectation? (you can check the last time you got the data)

2. Conditions/Filters

If the sources are similar, are there any conditions or filters that are different among the dashboards?

Do they have different date range selections?
Any particular population that is filtered out from the dataset?

3. Calculation

If you are sure that the data source and filters are similar, it may be that the calculations are different.

Do they use the same fields for calculation? (eg. one uses period start the other uses period end etc.)
Do they use the same formula for calculation? (eg. one uses average the other uses median, one has normalization the other doesn’t, etc.)

Step 3: Recommendation

After your investigation, you now know why the dashboards produce different numbers.

There are 2 possible routes here:

1/ If one has logic that is outdated, you can delete it and recommend stakeholders to check out the other dashboard (you can also update the logic to match the most up-to-date one if you absolutely need to keep them all. I’m just all about cleaning up and removing dashboards wherever it makes sense to reduce maintenance and stakeholder’s confusion.)

2/ If it makes sense to keep separate dashboards with different logic because they have different use cases (and I hope you already think of why you cannot merge them into a single dashboard that serves most of the needs and empower more self-serve),

Based on the context stakeholders share with you from the beginning, suggest which dashboard they should checkout
Update your dashboard title to make it more descriptive on the questions it can help answer

Proactive solution

Isn’t it less exciting, and sometimes annoying to investigate why numbers don’t match? Yes, at least for me.

So I try as much as possible to proactively avoid it from the beginning. Here are some considerations I recommend you to think about. I won’t go deep into how you should implement them because the solution can differ from company to company.

1/ Have a single source of truth for metrics definition and use definitions consistently across your dashboards.

2/ Whenever you create a new dashboard, think of whether you need it, how can you empower more self-service? (to me, more self-service = dashboard users have the ability to answer as many questions as possible, without having to reach out to the Data team for support)

Here’s a recent inspiring product design principle I learned: “Make 80% visible and 20% possible” — think about designing your dashboard in a way that can answer as many questions as possible, and for the rest that you can't answer, allow users to download excel or CSV files from your dashboard to do their own analysis.

3/ And if you absolutely need it,

Make sure to check whether the numbers there are as expected (eg. check other frequently used dashboards that may have similar metrics showing up).
Have the dashboard (and code, if applicable) reviewed by at least a person from your team to make sure you don’t leave any unintended bugs.

Photo credits: Xavi Cabrera on Unsplash

Step 1: Clarify

Step 2: Investigation

1. Data source

2. Conditions/Filters

3. Calculation

Step 3: Recommendation

Proactive solution

You might also like...

Data Validation - Phần quan trọng nhất khi viết data model

4 bước làm quen công việc mới

Tối ưu hoá danh sách 360 Reviewer bằng dữ liệu (phần 2)

Tối ưu hoá danh sách 360 Reviewer bằng dữ liệu (phần 1)

Data & việc giảm tiền điện

Data & việc giảm tiền điện