Fourier analysis

The purpose of spectral density estimation is to measure the amplitude of a signal or a time series according to its frequency [3:4]. The spectral density is estimated by detecting periodicities in the dataset. A scientist can better understand a signal or time series by analyzing its harmonics.

Note

The spectral theory

Spectral analysis for time series should not be confused with spectral theory, a subset of linear algebra that studies Eigenfunctions on Hilbert and Banach spaces. Harmonic and Fourier analyses are regarded as a subset of spectral theory.

The fast Fourier transform (FFT) is the most commonly used frequency analysis algorithm [3:5]. Let's explore the concept behind the discrete Fourier series and the Fourier transform as well as their benefits as applied to financial markets. The Fourier analysis approximates any generic function as the sum of trigonometric functions, sine and cosine. The decomposition in a basic trigonometric function is known as a Fourier transform [3:6].

Discrete Fourier transform (DFT)

A time series {xk} can be represented as a discrete real-time domain function f, x=f(t). In the 18th century, Jean Baptiste Joseph Fourier demonstrated that any continuous periodic function f could be represented as a linear combination of sine and cosine functions. The discrete Fourier transform (DFT) is a linear transformation that converts a time series into a list of coefficients of a finite combination of complex or real trigonometric functions, ordered by their frequencies.

The frequency ω of each trigonometric function defines one of the harmonics of the signal. The space that represents signal amplitude versus frequency of the signal is known as the frequency domain. The generic DFT transforms a time series into a sequence of frequencies defined as complex numbers ω = a + j.φ (j2= -1), for which a is the amplitude of the frequency and φ is the phase.

This section is dedicated to the real DFT that converts a time series into an ordered sequence of frequencies with real values.

Note

Real discrete Fourier transform

A periodic function f can be represented as an infinite combination of sine and cosine functions:

Discrete Fourier transform (DFT)

The Fourier cosine transform of a function f is defined as:

Discrete Fourier transform (DFT)

The discrete real cosine series of a function f(-x) = f(x) is defined as:

Discrete Fourier transform (DFT)

The Fourier sine transform of a function is defined as:

Discrete Fourier transform (DFT)

The discrete real sine series of a function f(-x) = f(x) is defined as:

Discrete Fourier transform (DFT)

The computation of the Fourier trigonometric series is time consuming with an asymptotic time complexity of O(n2). Several attempts have been made to make the computation as effective as possible. The most common numerical algorithm used to compute the Fourier series is the fast Fourier transform created by J. W. Cooley and J. Tukey [3:7]. The algorithm, called Radix-2, recursively breaks down the Fourier transform for a time series of N data points into any combination of N1 and N2 sized segments such as N = N1 N2. Ultimately, the discrete Fourier transform is applied to the deepest-nested segments.

Tip

The Cooley-Tukey algorithm

I encourage you to implement the Radix-2 Cooley-Tukey algorithm in Scala using a tail recursion.

The Radix-2 implementation requires that the number of data points is N=2n for even functions (sine) and N = 2n+1 for cosine. There are two approaches to meet this constraint:

  • Reduce the actual number of points to the next lower radix, 2n < N
  • Extend the original time series by padding it with 0 to the next higher radix, N < 2n+1

Padding the original time series is the preferred option because it does not affect the original set of observations.

Let's define a base class, DTransform[T], for all the fast Fourier transforms, parameterized with a view bounded to the Double type (Double, Float, and so on). The first step is to implement the padding method, common to all the Fourier transforms:

trait DTransform[T] extends PipeOperator[XTSeries[T], XTSeries[Double]] {
  def padSize(xtSz: Int, even: Boolean=true): Int = {
     val sz = if( even ) xtSz else xtSz-1
     if( (sz & (sz-1)) == 0) 0
     else {
        var bitPos = 0
        do {
        bitPos += 1
        } while( (sz >> bitPos) > 0)
        (if(even) (1<<bitPos) else (1<<bitPos)+1) - xtSz
     }
  }
   
  def pad(xt: XTSeries[T], even: Boolean=true)
           (implicit f: T => Double): DblVector = {
     val newSize = padSize(xt.size, even)
     val arr: DblVector = xt
     if( newSize > 0) arr ++ Array.fill(newSize)(0.0)  else arr
  }
}

Tip

The while loop

Scala developers prefer Scala higher-order methods on collection to implement iterative computation. However, nothing prevents you from using a traditional while loop if either readability or performance is an issue.

The fast implementation of the padding method, pad, consists of detecting the number of observations, N, which is a power of 2 using the bit operator & by evaluating whether N & (N-1) is null. The next highest radix is extracted by computing the number of bits shift in N. The code illustrates the effective use of implicit conversion to make the code readable. The arr: DblVector = series conversion triggers a conversion defined in the XTSeries companion object.

The next step is to write the DFT class for the real discrete transforms, sine and cosine, by subclassing DTransform. The purpose of the class is to select the appropriate Fourier series, pad the time series to the next power of 2 if necessary, and invoke the FastSineTransformer and FastCosineTransformer classes of the Apache Commons Math library [3:8] introduced in the first chapter:

class DFT[@specialized(Double) T<%Double] extends DTransform[T] {
  def |> : PartialFunction[XTSeries[T], XTSeries[Double]] = {
    case xt: XTSeries[T] if(xt != null && xt.length > 0) => 
      XTSeries[Double](fwrd(xt)._2)
  }
  def fwrd(xt:XTSeries[T]): (RealTransformer, DblVector)= {
     val rdt = if(Math.abs(xt.head) < DFT_EPS) 
     new FastSineTransformer(DstNormalization.STANDARD_DST_I)
     else new FastCosineTransformer(DctNormalization.STANDARD_DCT_I)
   
     (rdt, rdt.transform( pad(xt,xt.head==0.0),TransformType.FORWARD))
   }
}

The discrete Fourier sine series requires that the first value of the time series is 0.0. This implementation automates the selection of the appropriate series by evaluating series.head. This example uses the standard formulation of the cosine and sine transformation, defined by the DctNormalization.STANDARD_DCT_I argument. The orthogonal normalization, which normalizes the frequency by a factor of 1/sqrt(2(N-1), where N is the size of the time series, generates a cleaner frequency spectrum for a higher computation cost.

Note

@specialized

The @specialized(Double) annotation is used to instruct the Scala compiler to generate a specialized and more efficient version of the class for the type Double. The drawback of specialization is the duplication of byte code as the specialized version coexists with the parameterized classes [3:9].

In order to illustrate the different concepts behind DFTs, let's consider the case of a time series generated by a sequence h of sinusoidal functions:

val _T= 1.0/1024
val h = (x:Double) =>2.0*Math.cos(2.0*Math.PI*_T*x) + Math.cos(5.0*Math.PI*_T*x) + Math.cos(15.0*Math.PI*_T*x)/3

As the signal is synthetically created, we can select the size of the time series to avoid padding. The first value in the time series is not null, so the number of observations is 2n+1. The data generated by the function h is plotted as follows:

Discrete Fourier transform (DFT)

Example of the sinusoidal time series

Let's extract the frequencies spectrum for the time series generated by the function h. The data points are created by tabulating the function h. The frequencies spectrum is computed with a simple invocation of the pipe operator on the instance of the DFT class:

val rawOut = "output/chap3/raw.csv"
val smoothedOut = "output/chap3/smoothed.csv"
val values = Array.tabulate(1025)(x =>h(x/1025))
DataSink[Double](rawOut) |> values //1

val smoothed = DFT[Double] |> XTSeries[Double](values) //2
DataSink[Double]("output/chap3/smoothed.csv") |> smoothed

The first data sink (the type DataSink) stores the original time series into a CSV file (line 1). The DFT instance extracts the frequencies spectrum and formats it as time series (line 2). Finally, a second sink saves it into another CSV file.

Tip

Data sinks and spreadsheets

In this particular case, the results of the discrete Fourier transform are dumped into a CSV file so that it can be loaded into a spreadsheet. Some spreadsheets support a set of filtering techniques that can be used to validate the result of the example.

A simpler alternative would be to use JFreeChart.

The spectrum of the time series, plotted for the first 32 points, clearly shows three frequencies at k=2, 5, and 15. This is expected because the original signal is composed of three sinusoidal functions. The amplitude of these frequencies are 1024/1, 1024/2, and 1024/6, respectively. The following plot represents the first 32 harmonics for the time series:

Discrete Fourier transform (DFT)

Frequency spectrum for a three-frequency sinusoidal

The next step is to use the frequencies spectrum to create a low-pass filter using DFT. There are many algorithms to implement a low or pass band filter in the time domain from autoregressive models to the Butterworth algorithm. However, the fast Fourier transform is still a very popular technique to smooth signals and extract trends.

Note

Big Data

A DFT for a large time series can be very computation intensive. One option is to treat the time series as a continuous signal and sample it using the Nyquist frequency. The Nyquist frequency is half of the sampling rate of a continuous signal.

DFT-based filtering

The purpose of this section is to introduce, describe, and implement a noise filtering mechanism that leverages the discrete Fourier transform. The idea is quite simple: the forward and inverse Fourier transforms are used sequentially to convert the time series from the time domain to the frequency domain and back. The only input you need to supply is a function G that modifies the sequence of frequencies. This operation is known as the convolution of the filter G and the frequencies spectrum. A convolution is similar to an inner product of two time series in the frequencies domain. Mathematically, the convolution is defined as follows:

Note

Convolution

The convolution of two functions f and g is defined as:

DFT-based filtering

DFT convolution

One important property of the Fourier transform is that convolution of two signals is implemented as the inner product of their relative spectrums:

DFT-based filtering

Let's apply the property to the discrete Fourier transform. If a time series {xi} has a frequency spectrum DFT-based filtering and a filter f in a frequency domain defined as DFT-based filtering, then the convolution is defined as:

DFT-based filtering

Let's apply the convolution to our filtering problem. The filtering algorithm using the discrete Fourier transform consists of five steps:

  1. Pad the time series to enable the discrete sine or cosine transform.
  2. Generate the ordered sequence of frequencies using the forward transform.
  3. Select the filter function g in the frequency domain and a cutoff frequency.
  4. Convolute the sequence of frequency with the filter function g.
  5. Generate the filtered signal in the time domain by applying the inverse DFT transform to the convoluted frequencies.
    DFT-based filtering

    Diagram of a discrete Fourier filter

The most commonly used low-pass filters are known as the sinc and sinc2 functions, defined as a rectangular function and a triangular function, respectively. The simplest low-pass filter is implemented by a sinc function that returns 1 for frequencies below a cutoff frequency, fC, and 0 if the frequency is higher:

def sinc(f: Double, fC: Double): Double = if(Math.abs(f) < fC) 1.0 else 0.0
def sinc2(f: Double, fC: Double): Double = if(f*f < fC) 1.0 else 0.0

The filtering computation is implemented as a data transformation (pipe operator |>). The DFTFir class inherits from the DFT class in order to reuse the fwrd forward transform function. As usual, exception and validation code is omitted. The frequency domain function g is an attribute of the filter. The g function takes the frequency cutoff value fC as the second argument. The two filters sinc and sinc2 defined in the previous section are examples of filtering functions.

class DFTFir[T <% Double](val g: (Double, Double) =>Double, val fC; Double) extends DFT[T]

The pipe operator implements the filtering functionality:

def |> : PartialFunction[XTSeries[T], XTSeries[Double]] = {
  case xt: XTSeries[T] if(xt != null && xt.size > 2) => {
    val spectrum = fwrd(xt) //1
    val cutOff = fC*spectrum._2.size
    val filtered = spectrum._2.zipWithIndex.map(x => x._1*g(x._2, cutOff)) //2
    XTSeries[Double](spectrum._1.transform(filtered, TransformType.INVERSE)) //3
}

The filtering process follows three steps:

  1. Computation of the discrete Fourier forward transformation (sine or cosine), fwrd.
  2. Apply the filter function through a Scala map method.
  3. Apply the inverse transform on the frequencies.

Let's evaluate the impact of the cutoff values on the filtered data. The implementation of the test program consists of invoking the DFT filter pipe operator and writing results into a CSV file. The code reuses the generation function h introduced in the previous paragraph:

val price = src |> YahooFinancials.adjClose  
val filter = new DFTFir[Double](sinc, 4.0)
val filteredPrice = filter |> price

Filtering out the noise is accomplished by selecting the cutoff value between any of the three harmonics with the respective frequencies of 2, 5, and 15. The original and the two filtered time series are plotted on the following graph:

DFT-based filtering

Plotting of the discrete Fourier filter-based smoothing

As you would expect, the low-pass filter with a cutoff value of 12 removes the noise with the highest frequencies. The filter (with the cutoff value 4) cancels out the second harmonic (low-frequency noise), leaving out only the main trend cycle.

Detection of market cycles

Using the discrete Fourier transform to generate the frequencies spectrum of a periodical time series is easy. However, what about real-world signals such as the time series representing the historical price of a stock?

The purpose of the next exercise is to detect, if any, the long term cycle(s) of the overall stock market by applying the discrete Fourier transform to the quote of the S&P 500 index between January 1, 2009, and December 31, 2013, as illustrated in the following graph:

Detection of market cycles

Historical S&P 500 index prices

The first step is to apply the DFT to extract a spectrum for the S&P 500 historical prices, as shown in the following graph, with the first 32 harmonics:

Detection of market cycles

Frequencies spectrum for historical S&P index

The frequency domain chart highlights some interesting characteristics regarding the S&P 500 historical prices:

  • Both positive and negative amplitudes are present, as you would expect in a time series with complex values. The cosine series contributes to the positive amplitudes while the sine series affects both positive and negative amplitudes, (cos(x+π) = sin(x)).
  • The decay of the amplitude along the frequencies is steep enough to warrant further analysis beyond the first harmonic, which represents the main trend. The next step is to apply a pass-band filter technique to the S&P 500 historical data in order to identify short-term trends with lower periodicity.

A low-pass filter is limited to reduce or cancel out the noise in the raw data. In this case, a passband filter using a range or window of frequencies is appropriate to isolate the frequency or the group of frequencies that characterize a specific cycle. The sinc function introduced in the previous section to implement a low-band filter is modified to enforce the passband within a window, [w1, w2], as follows:

def sinc(f: Double, w: (Double, Double)): Double = if(Math.abs(f) > w._1 && Math.abs(f) < w._2) 1.0 else 0.0

Let's define a DFT-based pass-band filter with a window of width 4, w=(i, i +4), with i ranging between 2 and 20. Applying the window [4, 8] isolates the impact of the second harmonic on the price curve. As we eliminate the main upward trend with frequencies less than 4, all filtered data varies within a short range relative to the main trend. The following graph shows output of this filter:

Detection of market cycles

The output of a pass-band DFT filter range 4-8 on the historical S&P index

In this case, we filter the S&P 500 index around the third group of harmonics with frequencies ranging from 18 to 22; the signal is converted into a familiar sinusoidal function, as shown here:

Detection of market cycles

The output of a pass-band DFT filter range 18-22 on the historical S&P index

There is a possible rational explanation for the shape of the S&P 500 data filtered by a passband with a frequency of 20, as illustrated in the previous plot; the S&P 500 historical data plot shows that the frequency of the fluctuation in the middle of the uptrend (trading sessions 620 to 770) increases significantly. This phenomenon can be explained by the fact that the S&P 500 index reaches a resistance level around the trading session 545 when the existing uptrend breaks. A tug-of-war starts between the bulls, betting the market nudges higher, and the bears, who are expecting a correction. The back and forth between the traders ends when the S&P 500 index breaks through its resistance and resumes a strong uptrend characterized by a high amplitude and low frequency, as shown in the following graph:

Detection of market cycles

One of the limitations of using the Fourier transform to clean up data is that it requires the data scientist to extract the frequencies spectrum and modify the filter on a regular basis, as he or she is never sure that the most recent batch of data does not introduce noise with a different frequency. The Kalman filter addresses this limitation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset