Generative AI: Transforming the Role of Data Scientists

2024-01-19

Let's be straightforward: the role of a data scientist is not going to disappear anytime soon. On the contrary, this position will continue to evolve, especially with the emergence of new generative artificial intelligence tools.

Microsoft Senior Data and Applied Scientist Siddhartha Sharan stated in a recent podcast, "These tools (LLM) are beneficial because they can improve efficiency and help solve problems when stuck. However, those who claim that these tools will replace data scientists or data engineering jobs have not fully considered the impact of such statements."

Artificial intelligence expert Vin Vashishta supports this view, saying, "The effectiveness of generative AI tools is enough to enhance people's abilities, but after a year of work, I haven't seen anything that can replace people. Most tools are still in the concept validation stage, and there are some flaws that need to be addressed before we talk about AI replacing people's jobs."

Enhancing Data Scientists with Generative AI

Previously, data scientists spent hours on mundane tasks such as data cleaning and formatting. Generative AI can automate these trivial activities, freeing up time for data scientists to tackle more complex problems.

Vashishta says, "We spent a lot of time explaining the same things or answering the same questions. As the business expands, so does the workload, and these repetitive tasks significantly increase the workload. Small generative AI models can easily automate these use cases. Outsourcing simple tasks can free up people's time to handle more complex work."

With generative AI, data scientists can now use algorithms to generate synthetic data that simulates real-world scenarios. This speeds up the data preparation stage, allowing professionals to focus more on analyzing and interpreting results.

In addition, generative AI enables data scientists to explore data in innovative ways. Ruban Phukan, Co-founder and CEO of GoodGist, states, "Data scientists are evolving into 'solution scientists,' designing creative solutions using the GenAI toolkit, or business automation architects, leveraging AI to build automated solutions for business functions."

However, even with these advancements, generative AI cannot replace the unique skills and problem-solving approaches of data scientists. It falls short in understanding specific business challenges, considering the human aspect, or independently acquiring necessary domain knowledge.

For example, when discussing sentiment analysis, Sharan says, "It's hard to say if there will be no human involvement at all because our approach is AI-driven for the first three passes, and then there is human involvement to validate the results."

For Aspiring Data Scientists

According to Sharan, it is important for the next generation of data scientists to keep up with the use cases of generative AI. Sharan says, "Data scientists should read and understand various models, knowing their pros and cons. Your project managers or engineers don't want you to quote solutions. Instead, they seek guidance on which model to consider for specific problems, deploy which model, and which model will be more effective in the long run."

Furthermore, he believes it is necessary for data scientists to understand the cost of using various language models. For example, putting all data into GPT-4 for aggregation may be costly and may not necessarily make sense.

He says, "How do you effectively reduce costs while maintaining a sufficiently large profit margin for the product? This is a key question and an area where data scientists can provide significant help. This is what data scientists need to learn."

In fact, if you review job postings for data scientist positions, you will find that most companies have updated their requirements. For example, the job description for a data scientist at HP states, "As a data scientist focused on generative AI, you will be involved in various projects at HP, involving large language models and other new generative AI capabilities."

Similarly, IBM's job description states, "Stay abreast of the latest trends and advancements in artificial intelligence, foundational models, and large language models. Evaluate emerging technologies, tools, and frameworks to assess their potential impact on solution design and implementation."

Recently, IBM partnered with Coursera to launch a course titled "Generative AI for Data Scientists Specialization" to empower professionals to enhance their skills.