I am sure that, by now, almost every business professional has heard about the new professions related to data: data analyst, business analyst, business intelligence, data scientist, artificial intelligence developer, and many others. The truth here is that no one has a very clear definition of what a data scientist is, beyond the intuitive concept of knowing that a data scientist is a person who works with a lot of data.
However, it is not my intention to enter the endless debate on the definition of what a data scientist is, but to maintain the premise that a data scientist ‘is a person who works with a lot of data’. That person works with many different types of data and uses that data to solve business problems. Of course, data comes in many forms and a data scientist may have to work with different types of data, such as structured, unstructured, and semi-structured. What’s more, that individual may have to work with varying volumes of data, ranging from a few thousand lines to billions of lines distributed in many clusters.
Regardless of the definition of a data scientist, I venture to tell you that it is very likely that you will end up being one of them.
Let’s start with a graph prepared by Statista, where it can be seen that the growth in the volume of data—created, captured, copied, and consumed globally since 2010 and with forecasts until 2025—grows at an average year-on-year rate of 45%—being a growing exponential growth—or what is the same, that the generation of data doubles every 2 years, as can be seen in this Statista report.
If we stop to think, the first thing that comes to mind is that the population is growing and the use of technological tools is also growing, such as mobile or smartphones.
Next, let’s look at the population growth rate, as published by Our World in Data:
It is evident that the population is growing, but each time at a much slower rate, currently at a growth rate of around 1.2% (approximately).
Now, let’s analyze the growth rate of mobile telephony, given that it is perhaps the most accessible technological device to the population, although it is not the only one.
Source: BI Intelligence Estimates
What can be deduced from the average year-on-year growth of the mobile telephony market is that the average growth rate is approximately 2%.
So, based on what has been seen so far, data generation is growing at a rate close to 45% year-on-year, but the population is growing at a rate of approximately 1.2%—and decreasing—and mobile devices at a rate of 2%.
At first glance, something is missing… or maybe not!
If something is missing, many things are actually missing. Let’s simplify what’s missing with something we haven’t considered. Machines and systems also produce data that is captured, stored, copied, and consumed. And in addition to this, human beings generate more and more data.
In other words, and with the intention of being simplistic, more and more data is available to analyze.
The next question I would like to address is: how are we professionals going to be able to analyze more and more data per person (proportionally)? We cannot increase human resources at the same rate as the growth of available data.
Here comes another interesting graph, also posted by Wikipedia:
This graph shows the number of transistors in microprocessors, which is doubling every year. In fact, the graph has an exponential form, since, in 2012, there were microchips with 1 million transistors, and in 2020, microchips with 50 million. Multiplying by 50, the capacity of transistors in just 8 years seems really impressive.
In other words, technological computing capacity grows at a much higher rate than the generation of data by humanity.
This means that, in order to be able to analyze more and more data by a single person, it is necessary to provide them with up-to-date and adequate technological tools. Not only technological tools, but knowledge, techniques, and processes that allow any professional the ability to process and analyze more and more data.
Nowadays, any professional works with data. The volume of data is increasing constantly, so you will end up being a data scientist, albeit highly specialized in your professional field.