Wondering what the holy grail of ETL looks like?
Lubo on etl python sql
Recently, I stumbled upon an article listing 20 best ETL tools. Some of them I knew, some I've never heard of. The list was definitely not complete as many were missing. But that's not the point.
No size fits all
I've had a chance to witness many tools and approaches towards Extract-Transform-Load during my career. They ranged from simple to truly byzantine. The choice of tools didn't matter that much as it was people who made the projects successful.
I still remember a DWH project where the main drivers for ELT jobs were some shell scripts, SQL and an Excel spreadsheet defining the pipelines. Indeed, it was simple architecture but it left enough space to focus on the complex tasks, i.e. business rules and their translation into SQL. And it worked.
I also participated on projects with state-of-the-art technology. And pile of specification documents. Many times one document contradicted another. It was confusing for everyone. And that's something no technology can save.
Finally, I recall a moment when sales person from new entrant appeared in our meeting and started pushing revolutionary technology called ELT. Which was - according to him - the only true way for pushing data around.
But I digress.
The Holy Grail
When you finish some projects and work with different customers, you start to wonder what the holy grail of ETL would look like.
Recently, when I participated on some projects using Python+SQL as the center of extracting, cleansing, transforming and loading activities, the rebellious idea struck me - what if these technologies are the best suited for ETL?
Could it be the holy grail? Almost anyone can learn basics of Python and SQL. Pandas basics are also easy to understand although there is a steep learning curve when you dip into advanced topics.
Python has another advantage because it can be used for wide array of tasks and not just for ETL.
Also, the big advantage of code is that you own it. You can do whatever you want with it. You can rewrite it. You can reuse it. You can copy and paste it from one project to another. And no big corporation can charge you for using your code on your infrastructure.
Well, they may charge if you run your very own code on their very own cloud, but that's another story.
This wheel has been invented before
Of course that Python+SQL idea is nothing new. There are companies (like Netflix) who give analysts and engineers simple tools that are easy to work with - e.g. Jupyter notebooks - and build an infrastructure around it to support the data pipelines.
Many people will naturally praise free and open aspect of Python. While it is a valid point, for me it is the ubiquity of Python, the vast ecosystem of its libraries and - to some extent - availability of people with solid Python skills.
Some other advantages
Python means not just freedom for the developers but also freedom for employers. It is simple to extend your team with Python developers. It is much harder - and more expensive - to find people with good command of some proprietary tool.
Plus, commercial tool will cost extra $$$ to purchase. You also can't neglect the training which tends to be costly compared to free technologies.
Tools or people?
What about the simplicity of ETL solution? Does Python and SQL implies the project is easy to manage? That developers are more productive? That everything is rainbows and unicorns?
Certainly not. The success of your project depends on the team. On good combination of communication, experience, technical and management skills. This is something no tool can ensure.
Back to my initial question: "Is Python+SQL the best ETL tool ever"? It depends but I believe that for many use cases it is an excellent starter and for many businesses - big or small - it is great solution for many projects.
I am curious to hear your opinion about this topic or about your data engineering challenges in general. You can reach me at firstname.lastname@example.org.
If you are currently in search for skilled ETL engineers, we might be able to help you. Reach me to discuss your needs.