Kickstarting with dbt: Pro Strategies — Tip #4

5 min readAug 6, 2023

Why Did I Choose dbt?

In #3 Finding Your Data Fit — 30 Tips to be Data Practitioners, I introduced my initial data stack selection, the v1. I only use spreadsheet and BigQuery because that all I knew. The intitial data stack got the job done, enabling me to venture into areas like product analysis.

The product I worked on had multiple revenue sources, including ads and subscriptions, which we offered to users for added functionality or advanced features. I wanted to compare the behaviors of free users and paid users. To do this, I saved three SQL codes for segmentations: all users, free users, and paid users, in my note app. I used these whenever I needed to query user behavior. Although it felt primitive, it was the only method I could think of. The challenge was that I had to switch between my note app and BigQuery while conducting data analysis.

I struggled to find a solution online, mostly because I couldn’t narrow my problem down to a few search keywords 😩. Thankfully, my CTO introduced me to dbt. I realized what I had been trying to do was modularize my code, and dbt’s model seemed like the perfect fit for me.

What is dbt?

Visit https://www.getdbt.com/product/what-is-dbt/ for more information. They provide blogs, courses, and webinars to explain and promote the tool. There are also plenty of unofficial resources available — just search “dbt.”

To me, dbt was exactly what I needed. It solved my problems in ways I couldn’t have imagined.

dbt introduced me to the concept of data modeling. The random SQL code in my note app was transformed into structured models, which were separated into stages and marts. This provided mental clarity and allowed me to view the model hierarchy at a glance. I was also able to view the lineage between models using DAGs (directed acyclic graphs).

In the past, I had to verify the business logic and cross-check the results of my data. Still, there were chances of errors as nobody reviewed my SQL code. Only those familiar with the numbers could spot discrepancies. For example, forgetting to exclude test users could have an unnoticed impact on the month’s revenue.

dbt allowed me to add tests, helping me avoid some basic errors. Since dbt uses git by default, my CTO set up a GitHub project for me to perform data analysis. An engineer was assigned to review my SQL code 🎉, which gave me more confidence in my reports.

As someone who wasn’t an engineer, I felt proud and honored 🤭 to finally penetrate the “black box” that is GitHub. To engineers, code is a form of documentation. Even though I checked product spec, asking engineers to review the code often provided more accurate information.

ETL → ELT

In my previous article, #3 Finding Your Data Fit — 30 Tips to be Data Practitioners, I discussed how my initial data stack was ETL — Extract, Transfer, Load. After using dbt, my data stack evolved to ELT — Extract, Load, Transfer. A BI tool was also added to the mix.

The engineers extracted the data I needed and loaded them into BigQuery. With technological advancements, cloud data warehouses can efficiently manage petabytes of data. The amount of data I had wasn’t enormous, making it affordable, and even cost-effective, to dump all the data into the warehouse without the need for engineers to separate or preprocess it.

That’s the essence of ELT — load everything into a data warehouse and then transfer it. The transferred data is stored in the same data warehouse but in different tables. The engineers connected this data to a BI tool, in our case, Metabase.

How to Get Started with dbt?

The setup for BigQuery, dbt, and Metabase was all done by the engineers. These tools transformed my life as a data practitioner and introduced many new terms and concepts.

BigQuery and Metabase were easier to grasp since I already knew SQL. However, there were slight differences in the SQL used in these two platforms, which I learned through trial and error.

Dbt was more challenging. It wasn’t just about SQL. The engineers already used GitHub and VSCode, so they set up my local environment. At first, every time I finished my SQL code, I’d ask the engineer sitting next to me how to get the code to GitHub. To made his life easier, he was always willing to teach me so could stop interrupting his work one day 😆.

To get started, you need to understand the concept of git, learn the basic git commands, use GitHub or GitLab, and install an IDE like VSCode.

Check out dbt quick start and other learning resources. You don’t need to be a software engineer to do this. While you may need an engineer to help with the setup, you can handle the rest by learning from others.

🤜 Tip:

Choose the tool that your friends or colleagues use so they can help you with any question. Each tool has its pros and cons, and you won’t know what works best until you try it.
Don’t spend too much time installing these tools. It’s okay to ask for help.

Sailing Towards the New World 🚢

When I started using dbt, I was introduced to new things and terms like data warehouses, IDEs, DAGs, and more every day. I found the concept of the “modern data stack.” I wasn’t familiar with the legacy stack, but the modern one is active and exciting. I felt like I had joined a revolution 🤩. We’ll continue to talk about this in the next article.

🤩 I’m happy to hear how you handle product development process? what tool do you use? Feel free to reach out to me on LinkedIn Karen Hsieh or Twitter @ijac_wei.

🙋🙋‍♀️ Welcome to Ask Me Anything.

Kickstarting with dbt: Pro Strategies — Tip #4

Why Did I Choose dbt?

What is dbt?

ETL → ELT

How to Get Started with dbt?

Sailing Towards the New World 🚢

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Karen Hsieh

No responses yet