As Data Scientist, your duties may vary based on the size and type of your organization!
In this post, I will group the duties of a Data Scientist into five points:
- Ask a question
A Data Scientist must have a question he/she is trying to answer. This question could be a business problem that there is no tangible answer to. It could also come from the Data Scientist based on business reports and trends. The Data Scientists’ domain expertise also helps in asking the right questions as it relates to the business.
Business Analysts are very crucial in this step as they work hand-in-hand with the Data Scientist. - Get the data
There is little or nothing that a Data Scientist can do without access to the right data. Once there is a question, the next step is to find an answer to the question; and primarily the answers will be gotten from the available data. This task is really important as it can ruin every other thing and so much effort must be put in to ensure that the right data is gotten.
The Data Scientist partners with the Data Engineer to ensure that this is successful. - Explore the data
Data Exploration is the next step when the right data is made available. This part is where the data is critically examined to get possible insights that will help in answering the business problem. During the exploration, special attention is paid to the features that are directly impacting on the question to be answered. At this stage, possible patterns and trends are already identified and this is used in the next step.
Business Intelligence experts and Data Analysts play a very crucial role in this step. - Model the data
This is where insights gotten from the previous step in used in building a data model. The model being built is based on the question that is meant to be answered. It could be a supervised or unsupervised approach that is used in the model building. The model built is also used to make predictions about future occurrences based on available data.
Machine Learning Engineers assist in the model building and the deployment of the model to production. - Communicate results
At this step, the insights, trends and patterns are communicated and represented to the concerned parties. The data representation is mostly done through story telling and use of plots and charts. Effective data communication is key as what ever insights gotten from the data must be properly communicated in a way that it is easy to understand and the effectiveness to the business is also detailed.
Data Visualization Engineers role in this step is inevitable as they ensure that the data representation is done in a clear manner, without visuals that lead to brain clutter.