Dimensional reduction

Dimensional reduction is usually used to reduce the number of variables that are to be considered in an machine learning  project. It is often used where columns of data in a file have more than an acceptable number of missing values, have low variance, or are extremely variable in nature. Before attempting to reduce your data source by removing those unwanted columns, you need to be comfortable that this is the right thing to be doing. In other words, you want to make sure that the data you reduce does not create a bias in the remaining data. Profiling the data is an excellent way to determine whether the dimensional reduction of a particular column or columns is appropriate. Data profiling is a technique that is used to examine data to determine its accuracy and completeness. This is the process of examining a data source to uncover the erroneous sections in the data.

You can create effective scripts to accomplish this, and, as expected, there are numerous packages and libraries available to for you to download and use. However, once again, Watson Studio can easily do this for us.

We can gather the information we need to profile our data source without the need for scripting or programming by creating a data asset profile. The profile of a data asset created by Watson Studio by default includes generated metadata and statistics about the textual content of a data file.

To create a profile for your data, you can go to the asset's Profile page and click on Create Profile.

You can update any existing profile when the data changes.

After clicking on Create Profile, the results will be displayed, as shown in the following screenshot:

You can take a minute or two to scroll though the generated profile to view the various statistics, such as the overall number of columns and rows. You can search by column or data point, value frequencies, unique values, min/max, mean, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset