Finding Your Data Fit: Expert Advice — Tip #3

Karen Hsieh
4 min readAug 5, 2023

⛹️‍♀️ Start With Your Question

When starting your learning journey, the first question that often arises is how to select the appropriate data stack.

Should I learn SQL or Python?

I didn’t have this question because I didn’t know neither one at the beginning.

I started with a spreadsheet and then found SQL to resolve the issue that the spreadsheet could not open. (See #2 Charting Your Learning Path by Your Problem.) It was natural for me to learn SQL.

Understanding Your Needs and Ability

My needs and abilities back then were:

  • Check the ad performance.
  • No coding experience. No time to take a 5-week course.

What I could do were:

  • Keep using the spreadsheet. I still needed to check the ad report daily.
  • Call out the importance of ad performance and my difficulties.
  • Express that SQL seemed easy to learn for me.

👼 I was lucky that my CTO supported me. He found out I was learning SQL, so he taught me how to import a spreadsheet into BigQuery.

If the problem was not important enough, it could have been resolved by hiring an intern, increasing my workload, or decreasing the report update frequency from daily to monthly. If the problem was important but I could not learn SQL, then the task could have been assigned to another person. My data journey could have been dismissed! 🧟‍♀️

Extract Transfer Load

I was happy that I could get rid of the spreadsheet and do the analysis work faster. I didn’t realize how fortunate I was. When I faced the same issue in another company, there was no such support.

Let me introduce ETL first. It’s the pipeline to do data analysis. You must get some data, analyze it, and show your results to others.

  • Extract: extract data from where the data is produced
  • Transfer: clean, add business logic or other meaningful data
  • Load: load the data into the data warehouse or somewhere you can analyze

In my v1,

v1

I extracted data from the spreadsheet to BigQuery, did transfer on BigQuery and loaded data back to my spreadsheet. It was a good match since I only know spreadsheet and a little bit SQL.

When I went to another company and asked for BigQuery, there was no such thing! 😦 Then I had no idea how to do. That’s my v2.1,

v2.1

The engineer taught me to use sequelpro connected with the production server. I was bad using it and I wanted to see some chart. So they connected Metabase for me.

Some people know the following 🤣,

Why the production website is so slow?

Karen, stop querying data!

Note, that’s because OLAP vs. OLTP. Production server is OLTP which is not optimized for data analysis.

Balancing Technical and Business

In the v2.2 situation, I should consider,

  • Do I need the data update instantly? If not, I could batch extract data.
  • Are there many data customers waiting for my reports? If not, chart functions in spreadsheet are good enough.

Here came the v2.2, which met my needs without hurting the production. Everyone was happy.

v2.2

Selecting a data stack based on your specific problem and your surrounding environment. Evaluating if handling the technical part is necessary. Is it a worthy investment?

🤜 Tip here,

If you want to do some data analysis but cannot get access to data, try to focus on an important business problem and get some related data, e.g. csv, instead. Analyze these csv, share your finding on the problem, so others can feel the potential of your work and get inspired. Show the value then you can ask for more.

Finding Data Stack Aligned with ETL

There are many technologies to do data pipeline; I introduced ETL here. Select your data stack based on your problem, ability, and environment, e.g., the technology your company uses, and the business impact.

Don’t spend too much time choosing a data stack. Just try and see what can resolve your problem. Learn by doing.

🤩 I’m happy to hear how do you build product, what data do you check. Feel free to reach out to me on LinkedIn Karen Hsieh or Twitter @ijac_wei.

🙋🙋‍♀️ Welcome to Ask Me Anything.

--

--

Karen Hsieh
Karen Hsieh

Written by Karen Hsieh

Data📊 Empower 🙌 Product 💜

Responses (1)