Why become a Data Engineer?

Why become a Data Engineer?

Analytics Dashboard on Macbook Pro
The past few years have seen the unrelenting growth of Big Data as companies increasingly place importance on business intelligence and analytics. Demand for Data Scientists soared as companies looked for mathematically inclined employees who could give them interesting insights on how to improve their business and ultimately increase revenue. Buried patterns or insights discovered by talented Data Scientists can give a company the competitive advantage to surpass its competitors.

Given that data is potential knowledge and knowledge is power, it is hardly surprising that the trend is to gather as much of it as possible. The falling prices of storage, the rise of distributed platforms that run on commodity hardware (such as Hadoop) and the adoption of cloud platforms make it economically viable to store all of the company’s information. However, as the number of data sources and volume of data increases so does the complexity involved in gathering it. In order to enable Data Scientists to do their job effectively on quality data, there is an increasing demand for someone to provide the pipelines that collect and deliver it.

This is what Data Engineering is about. We are the plumbers that connect the data sources to their destination. We make sure there are no leaks or contamination as data flows through the pipeline, being refined along the way, to end up in a lake or warehouse ready for consumption. We get our hands dirty so Data Scientists don’t have to. And make no mistake, the extraction and cleaning of data can be unglamorous work, but it’s a job that needs doing. It is not the most rewarding of work because it is simple in theory, but there are always hurdles and problems of things not working as expected. You should keep in mind that we are not hired to have fun or work on rewarding projects, we are hired to solve problems. We are also rewarded handsomely for it.

Thomas Shelby shoveling hay
Everybody needs to do dirty work from time to time. Even gangster boss Thomas Shelby. Especially gangster boss Thomas Shelby.

While that is an important part of our work, it’s not all there is to it. When we think of plumbing, what usually comes to mind is the image of dirty water flowing from our toilet or a clogged pipe under our sink, but take a moment to admire the engineering that went into the picture below. It is a picture of the Pont du Gard, a beautifully architected and constructed aqueduct that delivered a steady flow of 400,000 cubic meters of fresh water a day to the city of Nimes (population 50,000) as early as 100 AD. While this is not the kind of system that would be built by modern day plumbers, in the relatively new field of Data Engineering we are also responsible for building the systems that take our raw data, clean it, homogenize it, tag it and deliver it where it’s meant to be consumed. This can be complex and rewarding work as we work with technologies like Spark and Kafka to make it possible.

Aqueduct
Beautiful Aqueduct of Pont du Gard

On the rise

Data Engineering is on the rise. Don’t take my word for it.

There is not much more to add other than that I have seen talented but inexperienced junior Data Engineers offered more jobs and for better pay than the average senior Software Engineer, which is already way above the average salary. And that’s just the starting point. One thing to keep in mind is that you usually need to live in or be willing to relocate to a major city for these kinds of offers, while Software Engineering roles are much more widespread.

Closely related to Software Engineering

As you might be aware if you have read any of the articles linked above, Data Engineering is closer to Software Engineering than it is to Data Science. This is good news for Computer Science graduates because it is more closely related to their field, while Data Science is arguably more focused on Math than Computer Science. Learning good Software Engineering principles will enable you to be a better Data Engineer, capable of building complex systems. Some companies’ definition of Data Engineer is someone who just writes queries to get data to and from their databases. A Data Engineer should be proficient with SQL as it is an important tool for his job, but it is far from his only responsibility. Aspiring candidates should look out for those companies and steer clear of them if they aim to become well-rounded engineers who work on interesting problems. We previously talked about being OK with doing the dirty work, but if it’s the only kind of work you do then you’re working at the wrong place.

Parting thoughts

Green field with path in the middle
Green field or well-trodden path. No wrong answer.

I hope this article has given you a basic idea of what Data Engineering and whether you might enjoy it. I have purposely not gone into depth about what the day to day work of a Data Engineer is like since I intend to do that in upcoming posts. If you enjoy programming, have an interest in good Software Engineering, like to constantly self-teach and experiment with new technologies, as well as preferring a green field to the well-trodden path, then Data Engineering could be for you. Stay posted for upcoming posts on learning the skills technologies used in our day to day work.


Comments are closed.