How to build data accessibility for everyone?

🖼 Be data-informed

In a data-informed culture, we’ll let data act as a check on our intuition. e.g. If we know the reason behind the revenue increase is discount or profit, we will take different actions.

Only a few people have data accessibility

📍Self-served analysis is a must

We need to get data so we can check. There are 2 approaches,

  1. get data by requesting the data team
  2. get data without requesting the data team

😵‍💫 Where to get data? — Data accessibility

“Get data without requesting the data team.” Where can we get data?

Data flows
a glance at bigquery-public-data
Learn about our data in Metabase
  • The data users learn about the tables on the BI tool. The data is transferred so it’s more clean and easy to understand.
  • The UI is friendly for non-tech people.
  • They don’t need to learn SQL. If they do, they can use the advanced features.
Everyone can access transferred data on BI tool

🐾 The stages and the challenges

Let’s go back to the beginning. In a data-informed culture, we like to empower data users to do self-serve analysis. It doesn’t start from self-serve. Most of the time, the data users are already self-serve by requesting data and analyzing it in spreadsheets.

1️⃣ The starting points

There are spreadsheets everywhere. There are spreadsheets for importing data, spreadsheets for calculating or adding some fact data manually, and spreadsheets for presenting the final data and charts. The entire ELT is in different spreadsheets. 😮

  • Data transfer is unclear and misaligned. Too many spreadsheets are developed by different people, and the way they transfer data are not the same.
  • It’s hard to check. You need to click the column of the spreadsheet to get the formula. And you need to ask why they write the formula in this way. There is no documentation.
  • We don’t know all the spreadsheets. Many spreadsheets are in people’s private cloud storage. If they don’t share it with us, we won’t know.

🥊 How to start

ETL and the tools I use
  • Visualize the relationships between source data tables. Cannot do that on dbt.
  • How to write the document to a level that is simple but clear enough. e.g. there are 5 types of orders. Should I write the description of 5 types in the order_type description?
  • The business definition is not clear enough. The data user who asks the data team to provide revenue reports, they don’t know there are details in the timezone, exchange rate, etc. Nor does the data team. Both sides are required to dive into the details.
  • The business logic may be different from the technical logic. 1) The terms may be different. When the sales ask for a revenue report, they want to get the sum of all orders, which is with tax. When business owners ask for a revenue report, they are looking for revenue that can be realized, which is without tax. 2) The business logic may be missing from the technical logic. Say there are coupons sponsored by different partners, but there is no “sponsor” column in the data. When a BD wants to see how many coupons have been redeemed by partners, how can we provide that?
  • The sense of the numbers. Take the sign-in rate of each sign-in option, for example. After the development of what to log, and how to calculate, we get 20% sign-in with Apple on Android phones and 15% sign-in with Apple on iOS phones. It doesn’t make sense for the data user but the data team may not discover it.
  • No one can validate the accuracy without cross-checking. Of course, we do check, e.g. When we generate revenue reports from raw order data, we use them after code review and logic review. When we compare the revenue reports with the financial reports from the accounting system, we may find out the numbers are different due to the timezone, exchange rate, and tax. If we don’t compare, we will never know. And it comes to another question: Which one is correct?

2️⃣ Shifting spreadsheets to the BI tool

We’ll hear a lot: “Why the number on Metabase is different from mine (spreadsheet)?”

⚔️ PK time — Earn the trust

Data user owns the business logic. They have their ways of analyzing numbers. Data team converts the business logic into codes. There are many details while converting.

  • What level of transferred data is close enough to the data users? We hope the data users pick the data and get answered easily. We also don’t want to maintain >100 production data that are out of our control.
Pick data on Metabase
  • Sometimes, we transfer data several times to the right level. What level should we materialize it? What data model should be stage or mart? How to manage?
  • Is the business logic simple and focus enough? If 10 business owners view 10 different metrics, can these 10*10 requests be covered in ❤ production data?
  • How to extend the DAG to dashboards? So we see the last miles to the data users.

3️⃣ Onboard more data users

Once we get 1 team relying on the transferred data + BI tool, we like to onboard all teams so we have an SSOT, a single source of truth, in the entire company.

  • No “Your revenue is not my revenue.” When we say “revenue”, we are talking about the same thing.
  • No spreadsheet with 25 tabs; Nor 25 spreadsheets for 1 report.
  • Short feedback loop. Explore the data and get the answers right away.
  • No repeat weekly, or monthly data work. The routine reports send out automatically.

🌱 Seed the data champions

Identify 1 person in each team who is interested in data. This person is the data champion who is curious about data and is excited to do self-serve. S/he will be the first one shifts her/his spreadsheet to the transferred data + BI tool.

  • How to manage many dashboards created by many data users?
  • How to encourage finding insights? We’d like to see data users exploring the data but also want to ensure they don’t misuse it.
  • How can we build on top of what we know?

🙌 Raise the data literacy

When people access data to do self-serve analysis and build reports, their thoughts flow fast. The data team creates true value, empowering people to find insights. Both sides are happy.

virtuous cycle

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Karen Hsieh

Karen Hsieh

592 Followers

A product manager builds company-wide data literacy and empowers the product team to create values for people and grow the company to profit. Twitter: @ijac_wei