5 trends in data science in the next 5 years
Published
This area is so large that it is a bit impossible to cover in detail all the things that can happen in the next 5 years. Key trends that I foresee but won't cover here are specific applications of data science in specific areas, the integration of low-code/no-code tools into the tech stack, and other narrowly focused insights.
If you want to acquire new data science skills or expand your existing skills, Skillshare is for you. Please click here to access Skillshare's learning platform and gain new insights into a wide variety of topics.
The focus is on the general, broad themes of change that I believe will prevail over the next half decade. The list is not exhaustive, but it covers many of the issues that arise in practice today:
- Better naming conventions
- Sustainable applications outside the technology industry
- Data-centric modeling
- Decisions Scientific expertise
- Data Science Creator economy
1. Better naming conventions
The designation " Data Scientist " has been a major problem for many in the industry, largely due to the lack of clarity around what the role entails and what the company needs. While I believe job descriptions have become clearer and more concise for the most part, job profiles are just getting started to normalize.
This change is really important because it represents the maturity that the craft is achieving. Ultimately, I see these professional families sorting themselves out:
1. Data Analyst / Product Data Scientist / Analytics Engineer
I think the Data Analyst is a phenomenal entry-level role for the industry, but that's why it's often written off as "basic" or "basic" when in reality it has a great deal of expertise. I see the Senior Analyst with much more experimentation and A/B testing knowledge who can work incredibly effectively with Product Owners and Scrum Teams.
I hypothesize that this role will be one of the hottest jobs in tech in the coming years. When roles like Analytics Engineer start commanding the respect they deserve, it will be a role that empowers people to be creative, design-oriented, quick learners and executers, and applicable in any field. This will arise naturally as the data scientist role is divided over time (see MLE below).
2. Research Scientist
This role is probably the first to be developed and understood. He is typically responsible for PhDs and advances the frontiers of AI in our society, primarily working on deep learning and reinforcement learning.
3. Machine Learning Engineer
Since I work in this area, I have a lot of thoughts on this. I believe that today's traditional data scientist and machine learning engineer will merge into the machine learning engineer. This also describes the job's function in more detail: developing end-to-end machine learning systems at scale. Today, most MLEs seem to be responsible for post-processing the models, while data scientists do the pre-processing. This can be problematic over time as responsibilities for long-term model implementations are passed down, so it is much more effective if one team has end-to-end oversight.
As new tools come to market and Python/Jupyter integration is introduced into Excel, the PoC work being done in Jupyter is likely to be done primarily by analysts (probably at a senior level if it is PoC for the modeling is done). The reason for this is that this part of the craft is becoming easier and faster. This last point is important because I am not suggesting that this part of the craft is easy or that the work of an analyst is trivial; it is an area of great difficulty that should be respected. I argue that the analyst's excellence lies in speed, and if he is able to carry out analysis of metrics, EDA, statistical testing and more at a faster pace across the tools he uses, then of course this begins to be his Wheelhouse to fit as tools evolve.
This could lead today's data scientists to choose a direction based on their skills and interests - are they more of an Analytics DS or a Machine Learning DS? Note that I didn't say a Software Engineering DS. All of the roles I discuss here require programming skills. Although some require higher programming skills, everyone should be able to write clean and reusable code. To me, that's like telling someone to do science without expecting them to follow the scientific method or the standard guidelines set by the scientific community.
4. Data engineer
Again, this is a role that will always be around and is pretty well known. Data engineers curate and source data sets from the company's existing data sources (lakes, warehouses, etc.) and may have primary oversight of how the data flows into the modeling and deployment pipeline.
2. Sustainable applications outside the technology industry
So far, AI/ML/DS has primarily gained acceptance in the areas of consumer technology, advertising and marketing. There are a handful of companies working with deep learning and making significant advances in computer vision and NLP, but the reality is that not everyone is developing deep learning applications. Most scenarios are structured data problems for which deep learning is not the most effective solution.
Although sales, marketing, and advertising are huge industries, I believe the most exciting applications for machine learning are yet to come. We are likely to see widespread adoption of ML in healthcare, legal, manufacturing, agriculture, and many other areas. Industries that are traditionally highly regulated or that are not primarily software will see dramatic change just so they can leverage machine learning at scale. Sustainability is also an important part of these applications. It will not be enough to simply visualize data that was previously not even thought to be collected. We will likely see ML solutions working alongside subject matter experts in real-time or production environments for industries that have taken some time to evolve technologically.
This is a win for many reasons. The most important one is that these industries will experience greater efficiency and innovative solutions that were not possible before. It will also make it easier for non-technical people to become technical people. Instead of having to be a machine learning professional, they can be a doctor hiring an MLE into their team to work with them. Any function that can be improved using data will have a “+” at the end, meaning they have ML capabilities in addition to what they do. For example: Lawyer vs. Lawyer+: refers to those who are knowledgeable about law and have studied it thoroughly and are now using machine learning to enhance their skills.
3. Data-centric modeling
Andrew Ng illustrated this best: A model is the sum of code and data. So far we have tirelessly refined the model and the hyperparameters while keeping the data constant. Although this has led to huge advances in academia, in industry we will see a huge shift towards keeping the model constant and iterating on the data and perhaps the hyperparameters as well. This is the essence of data-centric modeling .
As mentioned earlier, most of the problems are structured data problems, i.e. H. it's not about images, free text or audio. They deal with data tables in specific systems such as a database or the cloud. At the same time, we have largely found the most powerful models. There will inevitably be variations over time, but the models deployed in production environments are those that have been tested, validated, and have a good reputation in the community. This inevitably means that you typically won't spend much time modeling in the industry (this is already the case).
Most of your time will be spent figuring out the right data, creating a meaningful data set, intelligently building the features to capture the necessary business impact, and that doesn't even take into account everything that goes into making it happen model into production are required. Although modeling and math skills will always be valuable and needed in a team, you don't need a full team with these skills. You may have a staff member or two with strong modeling skills, but primarily you will (and do) need data experts. This is already largely the case today, but it will reach an extent that is currently difficult to predict. Tomorrow's data literacy will inevitably become highly specialized and difficult as society begins to record, collect and store data in ways we do not today, e.g. B. by integrating real-time sensors into textiles, urban infrastructure and our bodies.
4. Decisions Scientific competence
It's incredible how many focus on the depth of mathematical complexity instead of spending their time learning the inner workings of the company and understanding how people make decisions. Data Science is the practice of making data useful, and it will soon require practitioners to focus on the actual decisions that need to be made, changed, or stopped, and to speak in those terms.
I believe the gap between those who understand the entire modeling pipeline and those who have a deep understanding of the business will remain the same or widen over time. There are too many tools, techniques, and skills changing for non-technical people to keep up with. Therefore, those who have technical knowledge must acquire strong sales skills to bridge the gap.
Overestimating math will not help you influence an important decision that needs to be made. The best-trained data scientists must understand motivations, influencers, and different personalities and how they can influence those in power. This is common knowledge in the professional community today, but I predict that all bootcamps and programs will incorporate this as a key skill for success.
5. Data Science Creator Economy
As an artist, I find it incredible how many data scientists have the tenacity to create art and freelance. I've heard from many data scientists/MLEs who have pursued this path, whether through writing, unique passion projects, or consulting under their brand name. I think the path to freelancing for this role today can be very different - some will be good, most will be difficult, and some will be a fun way to spend extra hours in the day.
I believe that in time this will be a really serious career for children to pursue. We're currently experiencing a decline in the value of college that's worth a story of its own, and kids are really wondering why go into debt when they can learn the exact skills they want and need online in a few years (for entry-level roles ). Becoming a freelance data scientist will likely be a realistic option (and one that is fun) for many, and I predict that they will have the ability to achieve the high salaries we see in the industry today.
However, a few things must first be clarified, the most important being data protection and legal issues. Once companies have a standard for working with freelancers instead of constantly employing an entire team, I think this can be an effective route for beginners and practitioners with high brand value.