Pivoting data to and from value and indexes

Data is often stored in a stacked format, which is also referred to as record format. This is common in databases, .csv files, and Excel spreadsheets. In a stacked format, the data is often not normalized and has repeated values in many columns, or values that should logically exist in other tables (violating another concept of tidy data).

Take the following data which represents a stream of data from an accelerometer on a

An issue with this data as it is organized is: how does one go about determining the readings for a specific axis? This can be naively done with Boolean selections:

An issue here is, what if you want to know the values for all axes at a given time and not just the x axis. To do this, you can perform a selection for each value of the axis, but that is repetitive code and does not handle the scenario of new axis values being inserted into DataFrame without a change to the code.

A better representation would be where columns represent the unique variable values. To convert to this form, use the DataFrame objects' .pivot() function:

This has taken all the distinct values from the axis column and pivoted them into columns on the new DataFrame, while filling in values for the new columns from the appropriate rows and columns of the original DataFrame. This new DataFrame demonstrates that it is now very easy to identify the X, Y, and Z sensor readings at each time interval.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset