Chapter 13. Oracle Data Miner 4.1

The data mining process just means extracting relevant information from the tons of available data in the database. The relevance of data is with respect to the problem statement of any particular project. Data miner in SQL Developer 4.1 has enhanced features, and though it is significant to touch upon the new features that can be useful for emerging technologies and the users, in this chapter we will be discussing all the new features provided with the Data Miner tool and also the general enhancements in and around the tool. One of the most relevant and significant additions to the data mining capability is the growing popularity of JSON data and its use in Big Data configuration. Data Miner now provides an easy-to-use JSON Query node. In this chapter, we will start off with data source node preparation.

Data source node

As a first step, to invoke the data miner tool within SQL Developer, the next screenshot explains the data miner architecture diagrammatically. Data Miner by default is integrated into SQL Developer since version 2.0 and when invoked from Tools | Data Miner, it checks for the Data Miner repository.

As a prerequisite, Oracle Enterprise Edition is required to hold the Data Miner Repository under the schema called ODMRS.

Data source node
  • Oracle Database Enterprise Edition will have all the services required to support Oracle Data Miner.
  • Oracle Data Miner is a part of Oracle Advanced Analytics option to Oracle Database EE. Oracle Data Miner helps us in model building, testing, and scoring capabilities for Data Miner.
  • Oracle XML DB provides services to manage the Data Miner repository metadata, such as the details of the workflow specifications.
  • Oracle Scheduler provides the engine for scheduling the Data Miner workflows.
  • Oracle Text provides the necessary services required to support Text Mining.
    Data source node

    Tip

    If the Data Miner tab is not visible, you can dock it in Oracle SQL Developer window.

When invoked, SQL Developer prompts for the installation of the Data Miner repository, if the repository is not already built. In my case, my database did not have a repository and I was prompted to create one, as shown in the following screenshot:

Data source node

A Data Source node becomes the source of data for the data mining project. A data source node specifies the build data for a model. We will use the examples that were installed during the installation of the Data Miner repository. The following is the sequence that we will follow to add the data source node:

  • Create a new project
  • Create a new workflow
  • Add nodes to workflows
  • Link nodes
  • Run the nodes
  • View reports

Creating a new project

We can have the Workflow Jobs tab also open while we start creating our first project. To do this, go to View | Data Miner | Workflow Jobs.

Before you begin working on a Data Miner workflow, you need to create a Data Miner project, which serves as a container for one or more workflows. We created the data mining user during the installation; the user name is DM user. In the Data Miner tab, right-click on the data mining user connection that you previously created and select New Project, as shown in the following screenshot:

Note

My new project name will be DataMining_Demo

Creating a new project

Creating a new workflow

A Data Miner workflow is a collection of connected nodes that describe data mining processes and provide directions for the Data Mining server. The workflow actually emulates all phases of a process designed to solve a particular business problem.

The workflow enables us to interactively build, analyze, and test a data mining process within a graphical environment, as shown in the following screenshot:

Creating a new workflow

Addition of nodes to the workflow

Immediately after creating a new workflow, we will be able to see a blank workflow screen ready to build the workflow. The graphical representation for the workflow can be built by dragging and dropping into the workflow area. The components pane will give us all the nodes that we wish to add to the workflow.

Tip

Each element that we drag and drop from the components pane will become a node in our workflow. To begin with the workflow, the first element is always the data source.

The components palette shows all the available types of nodes that we can use to build our workflow, but we will only use a couple of nodes as examples in this chapter.

The following table shows all the available types of nodes, only for reference purposes.

Addition of nodes to the workflow

Remember, we had installed the sample data along with the data miner repository. For the rest of the chapter, we will use the sample tables from the sample to show the data miner concepts. As shown in the following screenshot, we will be using the table called INSUR_CUST_LTV_SAMPLE owned by the DM user to mine the data and exhibit the analytics capability of Data Miner:

Addition of nodes to the workflow

In the Define Data source dialog box, select the said table, click on Next and then finish to have the data source created. Explore Data is another node that will be added, which can help us validate the data source. For this, just right-click on the Data Source node and select Connect, drag the arrow up to the Explore Data node, and we have completed the step.

Addition of nodes to the workflow

Link nodes

Once all the nodes are placed in the required order, the next step would be to link the nodes in a meaningful and correct way. In the following example, you can see how the Data Source node is connected to the Explore Data node by right-clicking and selecting the Connect option.

Link nodes

Run nodes

After connecting the nodes in a meaningful fashion, we will be able to run the node and submit a workflow job related to it. Right-click on the Explore Data node and select the Run option to submit the workflow job. The status of the related workflow job will be displayed in runtime on the Workflow Jobs pane.

Run nodes

Once the nodes are defined, we are ready to run the nodes, which in turn submits the workflow job. The workflow job pane displays the submitted job and its status. A completed job is shown with a green tick () under the status column. We are now ready to generate the statistics report for the Explore Data node.

View reports

By clicking on View Data, Data Miner creates statistics based on a lot of information about each attribute in the dataset including a histogram, distinct values, mode, average, min and max value, standard deviation, variance, skewness, and kurtosis.

View reports

The display enables you to visualize and validate the data, and also to manually inspect the data for patterns or structure.

View reports
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset