Embracing Engineering Best Practices: Data Edition — Tip #24
In this series of articles from Tip #1, I’ve shared my insights from recent years as I delved into data as a product manager. dbt isn’t just a fantastic tool for people like me; it also provides a profound mindset shift. dbt was invented when Tristan founded Fishtown Analytics, enabling data analysts to work more like software developers 🛠️. Check out his article, Goodbye RJMetrics, Hello Fishtown Analytics.
However, as a non-technical data practitioner, one might wonder, “What does software developer work entail?” Let’s dive in! 🎣
Why Leverage Software Developers? 🤔
Tristan’s article, Building a Mature Analytics Workflow, explains it clearly. I’d like to echo this part:
Most of the problems with the current analytics workflow aren’t so bad if you’re working alone. You know about all the data available to you, you understand its significance, and you’re aware of its creation process. But scaling becomes a challenge. As soon as your analytics needs grow beyond a single analyst, these problems start to manifest.
When I used dbt for the first time, I was the sole user. Even though a data engineer reviewed my SQL code, his focus was on code logic, not business logic. As long as my code was functionally correct, it passed. So, I didn’t grasp the depth of the quote above. I mistakenly thought using GitHub and VSCode was synonymous with working like software developers 🤷♀️.
The second time I used dbt, my team consisted of a data engineer and a data analyst transitioning from an operator role. Although the data engineer knew how to work like a software developer, we mostly worked on our data models independently. Our collaboration was limited to using the same GitHub project.
As our models multiplied, we discovered that many were 80% duplicates. We also realized the importance of code reviews beyond mere functionality. We needed to understand each other’s thought processes to sometimes cover for each other. Knowledge had to be shared.
The solution to these workflow problems? Work like software developers! 👨💻
Indeed, the playbook for solving these issues already exists within our software engineering teams.
The techniques that software engineering teams use for collaborative, rapid creation of quality applications can also be applied to analytics.
New title ✨ Analytics Engineer ✨
dbt has also introduced a new role in data teams: Analytics Engineer. Here’s another insightful article explaining the emergence of analytics engineering: What is Analytics Engineering? 🌟
My role underwent a significant transformation. Finance and marketing teams could generate their own reports. My typical day involved preparing data for analysis by writing transformation and testing code, coupled with thorough documentation. My tools transitioned from Excel and Looker to iTerm, GitHub, and Atom.
So, was I still a data analyst?
I discussed this new title in Tip #5. In smaller companies, we might not have the luxury of individual roles for every function. But we recognize the three distinct hats 🎩 we must wear.
How To Work Like Software Developers? 🤔
To software developers, this might feel natural 😆. Here’s how I perceive the differences from my product manager role:
Discussing code 👨💻: Previously, we’d jot down questions or discuss them in person. Commenting directly on code and having discussions in that context is far more efficient and is a paradigm shift. Aim for clarity and rely on facts. Avoid vague feedback like, “I feel this is odd,” or “I’m slightly confused about XX.” Instead, showcase the code and base discussions around it.
To facilitate this, we learned it wasn’t effective to develop locally and only push to GitHub at completion. Instead, initiating a draft pull request early in the process proved beneficial. We were unaware of this feature initially, sticking only to the basics.
We had installed dbt-project-evaluator from the outset. However, we glossed over the responses, missing out on the distinction between warnings and errors. Only when we started paying attention did we appreciate the rationale behind the rules. These insights are valuable, whether you choose to adhere to them or not 🧠.
How to review an analytics pull request? offers excellent guidance. As analytics engineers, we translate business logic into code. Reviewers must assess not just the code but also the underlying business logic. Hence, it’s essential that the code is functionally correct, adheres to coding standards, and the business logic is transparent. We eventually created a PR template to ensure clarity in our pull requests. This template evolves as we continuously aim to improve our collaboration 🌱.
Additionally, several tools assist in our endeavors. Given the “engineer” in our title, we love leveraging technology. For instance, we use SQLFluff to define and monitor coding styles. More on this in Tip #7.
Let’s embrace the software engineering practices in our data-focused roles. 🚀
New Chapter 📖
Next, in our series, we’ll delve into the essence of the dbt community, which breathes life into the analytics engineering realm. I’d like to show you the value of a community and encourage you to join the dbt community to make an impact! 🤗