Writing modular code in RStudio

Using modular code is a best practice of computer programming. It basically involves dividing your code into independent pieces, where one module takes as an input the output of another one.

This recipe implements modular programming by leveraging the + function, which lets you execute R scripts from another script (or from the R terminal session itself) by collecting it in the local environment code output.

The advantage of modular code lies in the orthogonality principle: two pieces of code are orthogonal to each other if changing the first has no effect on the other.

Take, for instance, two pieces of code: the first one gives as an output a ZIP code from an address, and the second one takes that ZIP code and calculates the shipping cost for that ZIP code.

Until the first module gives a ZIP code as an output, the second module is totally unaware of how this code was defined. That is to say that any change in the first code will have no effect on the second one.

The Pragmatic Programmer by Andrew Hunt and David Thomas effectively shows this concept with the following graph:

Writing modular code in RStudio

This diagram clearly exposes the concept: any movement parallel to the x axis will make no difference on the y axis, since they are orthogonal to each other.

Getting ready

In order to get our work done, we are going to first analyze the process we are going to model with a simple workflow diagram. We are going to draw the diagram, leveraging the DiagrammeR package by Rich Iannone:

install.packages("DiagrammeR")
library(DiagrammeR)

How to do it...

  1. The first step is to define code workflow, outlying the input and output for each activity of the process.

    Refer to the recipe producing a process workflow diagram in RStudio in Chapter 4, Advanced and Interactive Visualization using the DiagrammeR package for workflow diagramming in R:

    node_attrs = c("fontname = Helvetica
      color = grey80"),
      edge_attrs = c("color = lightblue",
      "arrowsize = 0.5")
    define modules from your workflow
    How to do it...
  2. The next step is to define program scripts. In our example, we will have two modules:
    • zip_retrievement
    • shipping_cost_retrievement

    Write these two modules and place them in the current working directory in order to make them available for the following sourcing function:

    How to do it...
  3. Next, we create a main script, sourcing the others. Our main script will basically source all the other scripts:
    source("zip_retrievement.R",local = TRUE)
    source("shipping_cost_retrievement.R", local = TRUE)

How it works...

In step 1 we define a workflow of your code, outlying the input and output for each activity of the process.

The first step is quite relevant, even if in our example it seems to be a simple one. In this step, we decide which activities performed from your code will be packed in one unique module.

You may ask, which rule could be used to decide which activities need to be put in a common module? Well, there is no scientific rule for this kind of choice; nevertheless, we can be led by two high-level principles, which are as follows:

  • Clarity
  • Ease of maintenance

Clarity will let us avoid over-decomposition of our code in hundreds of modules in order to preserve the overall readability of our code. Ease of maintenance will push us to join the pieces of code that will more realistically need maintenance at a common point in time.

That being said, we don't have to be too afraid of making mistakes, since this is an iterative process and we will always be able to perfect it.

In step 2, we write each module in a script. For the second step, the main point is addressing this question: couldn't we just create chunks of code in one unique R script?

Of course you could! Nevertheless, dividing your code into separate modules, both logically and physically, will let you gain a greater clarity about your code's logical flow.

Moreover, which is not to be underestimated, this will make you quickly understand if the orthogonality principle is respected in your code. This is because after changing something in a module, you will find an error coming up from another module, and you will then know that the two of them are not orthogonal to each other.

In step 3 we write a main script, sourcing the others. In the third step, the main script will do nothing more than source all modules in their logical sequence. The crucial point in this sourcing activity is the local argument that is set to True.

This argument tells R to store objects resulting from the sourced script in the general local environment. This act will make these objects available as input for the next sourced scripts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset