Team Data Science Process

The TDSP process model provides a dynamic framework to machine learning solutions that have been through a robust process of planning, producing, constructing, testing, and deploying models. Here is an example of the TDSP process:

The process is loosely divided into four main phases:

  • Business Understanding
  • Data Acquisition and Understanding
  • Modeling
  • Deployment

The phases are described in the following paragraphs.

Business understanding

The Business understanding process starts with a business idea, which is solved with a machine learning solution. The business idea is defined from the business perspective, and possible scenarios are identified and evaluated. Ultimately, a project plan is generated for delivering the solution.

Data acquisition and understanding

Following on from the business understanding phase is the data acquisition and understanding phase, which concentrates on familiarity and fact-finding about the data.

The process itself is not completely linear; the output of the data acquisition and understanding phase can feed back to the business understanding phase, for example. At this point, some of the essential technical pieces start to appear, such as connecting to data, and the integration of multiple data sources. From the user's perspective, there may be actions arising from this effort. For example, it may be noted that there is missing data from the dataset, which requires further investigation before the project proceeds further.

Modeling

In the modeling phase of the TDSP process, the R model is created, built, and verified against the original business question. In light of the business question, the model needs to make sense. It should also add business value, for example, by performing better than the existing solution that was in place prior to the new R model.

This stage also involves examining key metrics in evaluating our R models, which need to be tested to ensure that the models meet the original business objectives set out in the initial business understanding phase.

Deployment

R models are published to production, once they are proven to be a fit solution to the original business question. This phase involves the creation of a strategy for ongoing review of the R model's performance as well as a monitoring and maintenance plan. It is recommended to carry out a recurrent evaluation of the deployed models. The models will live in a fluid, dynamic world of data and, over time, this environment will impact their efficacy.

The TDSP process is a cycle rather than a linear process, and it does not finish, even if the model is deployed. It is comprised of a clear structure for you to follow throughout the Data Science process, and it facilitates teamwork and collaboration along the way.

TDSP Summary

The data science unicorn does not exist; that is, the person who is equally skilled in all areas of data science, right across the board. In order to ensure successful projects where each team player contributes according to their skill set, the Team Data Science Summary is a team-oriented solution that emphasizes teamwork and collaboration throughout. It recognizes the importance of working as part of a team to deliver Data Science projects. It also offers useful information on the importance of having standardized source control and backups, which can include open source technology.

Since these methodologies both cover data preparation in detail, we will focus on data preparation for the remainder of this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset