- Scala for Machine Learning(Second Edition)
- Patrick R. Nicolas
- 1438字
- 2021-07-08 10:43:07
Fourier analysis
The purpose of spectral density estimation is to measure the amplitude of a signal or a time series according to its frequency [3:5]. The objective is to estimate the spectral density by detecting periodicities in the dataset. A scientist can better understand a signal or time series by analyzing its harmonics.
Note
Spectral theory:
Spectral analysis for time series should not be confused with Spectral Theory, a subset of linear algebra that studies Eigen functions on Hilbert and Banach spaces. In fact, harmonic analysis and Fourier analysis are regarded as a subset of spectral theory.
Let us explore the concept behind the discrete Fourier series as well as its benefits as applied to financial markets. Fourier analysis approximates any generic function as the sum of trigonometric functions, sine and cosine.
Note
Complex Fourier transform:
This section focuses on the discrete Fourier series for real value. The generic Fourier transform applies to complex values [3:6].
The decomposition in a basic trigonometric functions process is known as the Fourier transform [3:7].
Discrete Fourier transform (DFT)
A time series {xk} can be represented as a discrete real time-domain function f, x = f(t). In the 18th century, Jean Baptiste Joseph Fourier demonstrated that any continuous periodic function f is formulated as a linear combination of sine and cosine functions. The DFT is a linear transformation that converts a times series into a list of coefficients of a finite combination of complex or real trigonometric functions, ordered by their frequencies.
The frequency ω of each trigonometric function defines one of the harmonics of the signal. The space that represents signal amplitude versus frequency of the signal is known as the frequency domain. The generic DFT transforms a time series into a sequence of frequencies defined as complex numbers a + j.φ (j2= -1) for which a is the amplitude of the frequency and φ is the phase.
This section is dedicated to the real DFT that converts a time series into an ordered sequence of frequencies of real value.
Note
Real DFT:
M5: A periodic function f is represented as an infinite combination of sine and cosine functions:

M6: The Fourier cosine transform of a function f is defined as follows:

M7: The discrete real cosine series of a function f (-x) = f(x) is defined as follows:

M8: The Fourier sine transform of a function is defined as follows:

M9: The discrete real sine series of a function f (-x) = f(x) is defined as follows:

The computation of the Fourier trigonometric series is time-consuming, with an asymptotic time complexity of O(n 2 ). Scientists and mathematicians Several attempts have been working to make the computation as effective as possible. The most common numerical algorithm used to compute the Fourier series is the Fast Fourier Transform (FFT) created by J.W Cooley and J. Tukey [3:8].
The algorithm called Radix-2 version recursively breaks down the Fourier transform for a time series of N data points into any combination of N1 and N2 sized segments such as N= N1 N2. Ultimately, the discrete Fourier transform is applied to the deeper-nested segments.
Tip
Cooley-Tukey algorithm:
I encourage you to implement the Radix-2 Cooley-Tukey algorithm in Scala using a tail-recursion.
The Radix-2 implementation requires that the number of data points is N=2n for even functions (sine) and N=2 n+1 for cosine. There are two approaches to meet this constraint:
- Reduce the actual number of points to the next lower radix 2n < N
- Extend the original time series by padding it with 0 to the next higher radix N < 2n+1
Padding the original time series is the preferred option because it does not affect the original set of observations.
Let's define a trait, DTransform
, for any variant of the discrete Fourier transform. The first step is to wrap the default configuration parameters used in Apache Commons Math into a singleton, Config
:
trait DTransform { object Config { final val FORWARD = TransformType.FORWARD final val INVERSE = TransformType.INVERSE final val SINE = DstNormalization.STANDARD_DST_I final val COSINE = DctNormalization.STANDARD_DCT_I } … }
The main purpose of the trait DTransform
is to pad the time series vec
with zero values:
def pad(vec: DblVec, even: Boolean = true): DblVec = { val newSize = padSize(vec.size, even) //1 if( newSize > 0) arr ++ Array.fill(newSize)(0.0) else arr //2 } def padSize(xtSz: Int, even: Boolean= true): Int = { val sz = if( even ) xtSz else xtSz-1 //3 if( (sz & (sz-1)) == 0) 0 else { var bitPos = 0 do { bitPos += 1 } while( (sz >> bitPos) > 0) //4 (if(even) (1<<bitPos) else (1<<bitPos)+1) - xtSz } }
The method pad
computes the optimal size of the frequency vector as 2N by invoking the method padSize
(line 1). It then concatenates the padding with the original time series or vector of observations (line 2). The method padSize
adjusts the size of the data depending on whether the time series has initially an even or odd number of observations (line 3). It relies on bit operations to find the next radix N (line 4).
Tip
While loop:
Scala developers prefer Scala higher-order methods for collections to implement iterative computation. However, nothing prevents you to from using the traditional while
or do {…} while
loop if either readability or performance is an issue.
The fast implementation of the padding method, pad
, consists of detecting the number of observations, N, as a power of 2 (next highest radix). The method evaluates if N & (N-1) is zero after it shifts the number of bits in the value N.
The next step is to write the class DFT
for the real discrete transforms, sine and cosine, by sub-classing DTransform
. The class relies on the padding mechanism implemented in DTransform
whenever necessary:
class DFT[@specialized(Double) T: ToDouble]( eps: Double ) extends ETransform[Vector[T], DblVec](ConfigDouble(eps)) with DTransform { //5 protected val c: ToDouble[T] = implicitly[ToDouble[T]] override def |> : PartialFunction[Vector[T],Try[DblVec]] ={ //6 case xv: Vector[T] if(xv.size >= 2) => fwrd(xv).map(_._2.toVector) } }
We treat the discrete Fourier transform as a transformation on time series using an explicit configuration, ETransform
(line 5). The transformation function |>
delegates the computation to the method fwrd
(line 6):
def fwrd(xv: Vector[T]):
Try[(RealTransformer, Array[Double])] = {
val rdt = if(Math.abs(c.apply(xv.head)) < eps) //7
new FastSineTransformer(SINE) //8
else new FastCosineTransformer(COSINE) //9
val padded = pad(xv.map(c.apply(_)), xv.head == 0.0)
Try( (rdt, rdt.transform(padded.toArray, FORWARD)) )
}
The method fwrd
selects the discrete sine Fourier series if the first value of the time series is 0.0, the discrete cosine series otherwise. This implementation automates the selection of the appropriate series by evaluating xv.head
(line 7). The transformation invokes the FastSineTransformer
(line 8) and FastCosineTransformer
(line 9) classes of the Apache Commons Math library [3:9] introduced in the first chapter.
This example uses the standard formulation of the cosine and sine transformation, defined by the argument COSINE
. The orthogonal normalization, which normalizes the frequency by a factor of 1/sqrt(2(N-1)) where N is the size of the time series, generates a cleaner frequency spectrum for a higher computation cost.
Tip
@specialized annotation:
The annotation @specialized(Double)
is used to instruct the Scala compiler to generate a specialized and more efficient version of the class for the type Double
. The drawback of specialization is that the duplication of byte code as the specialized version coexists with the parameterized classes [3:10].
To illustrate the different concepts behind DFTs, let's consider the case of a time series generated by a sequence h
of sinusoidal functions:
val F = Array[Double](2.0, 5.0, 15.0)
val A = Array[Double](2.0, 1.0, 0.33)
def harmonic(x: Double, n: Int): Double =
A(n)*Math.cos(Math.PI*F(n)*x)
val h = (x: Double) =>
Range(0, A.size).aggregate(0.0)((s, i) =>
s + harmonic(x, i), _ + _)
As the signal is synthetically created, we can select the size of the time series to avoid padding. The first value in the time series is not null, so the number of observations is 2n+1. The data generated by the function h
is plotted as follows:

Example of sinusoidal time series
Let's extract the frequencies spectrum for the time series generated by the function h
. The data points are created by tabulating the function h
. The frequencies spectrum is computed with a simple invocation of the explicit data transformation of the DFT
class |>
:
val OUTPUT1 = "output/filteing/simulated.csv" val OUTPUT2 = "output/filteing/smoothed.csv" val FREQ_SIZE = 1025; val INV_FREQ = 1.0/FREQ_SIZE val pfnDFT = DFT[Double] |> //10 for { values <- Try(Vector.tabulate(FREQ_SIZE) (n => h(n*INV_FREQ))) //11 output1 <- DataSink[Double](OUTPUT1).write(values) spectrum <- pfnDFT(values) output2 <- DataSink[Double](OUTPUT2).write(spectrum) //12 } yield { val results = format(spectrum.take(DISPLAY_SIZE), "x/1025", SHORT) show(s"$DISPLAY_SIZE frequencies: ${results}") }
The execution of the data simulator follows these steps:
- Generate a raw data with the three-harmonic function
h
(line 11). - Instantiate the partial function generated by the transformation (line 10).
- Store the resulting frequencies into a data sink (filesystem) (line 12).
Tip
Data sinks and spreadsheets:
In this particular case, the results of the Discrete Fourier transform are dumped into a CSV file so they can be loaded into a spreadsheet. Some spreadsheets support a set of filtering techniques that can be used to validate the result of the example.
A simpler alternative would be to use JFreeChart.
The spectrum of frequencies for the time series, plotted for the first 32 points, clearly shows three frequencies at k=2, 5, and 15. The result is expected because the original signal is composed of three sinusoidal functions. The amplitude of these frequencies is 1024/1, 1024/2, and 1024/6, respectively. The following plot represents the first 32 harmonics for the time series:

Frequency spectrum for three-frequency sinusoidal
The next step is to use the frequencies spectrum to create a low-pass filter using DFT. There are many algorithms available to implement a low or pass band filter in the time domain, from autoregressive models to the Butterworth algorithm. However, the discrete Fourier transform is still a very popular technique to smooth signals and identify trends.
Note
Big data:
A DFT for a large time series can be very computationally intensive. One option is to treat the time series as a continuous signal and sample it using the Nyquist frequency. The Nyquist frequency is half of the sampling rate of a continuous signal.
DFT-based filtering
The purpose of this section is to introduce, describe, and implement a noise filtering mechanism that leverages the discrete Fourier transform. The idea is quite simple: the forward and inverse Fourier series are used sequentially to convert the raw data from the time domain to the frequency domain and back. The only input you need to supply is a function g that modifies the sequence of frequencies. This operation is known as the convolution of the filter g and the frequencies spectrum.
A convolution is similar to an inner product of two time series in the frequencies domain. Mathematically, the convolution is defined as follows.
Note
Convolution:
M10: The convolution of two functions f and g is defined as follows:

M11: The convolution F of a time series, x = (xi} with a frequency spectrum ωx and a filter f in frequency domain ωf is defined as follows:

Let's apply the convolution to our filtering problem. The filtering algorithm using the discrete Fourier transform consists of five steps:
- Pad the time series to enable the discrete sine or cosine transform.
- Generate the ordered sequence of frequencies using the forward transform, F.
- Select the filter function G in the frequency domain and a cut-off frequency.
- Convolute the sequence of frequency with the filter function G.
- Generate the filtered signal in the time domain by applying the inverse DFT transform to the convoluted frequencies:
Diagram of the discrete Fourier filter
The most commonly used low-pass filter functions are known as the sinc
and sinc2
functions defined as a rectangular function and a triangular function, respectively. These functions are partially applied functions derived from a generic convol
method. The simplest function sinc
returns 1
for frequencies below a cut-off frequency fC
, 0
if it is higher:
val convol = (n: Int, f: Double, fC: Double) => if( Math.pow(f, n) < fC) 1.0 else 0.0 val sinc = convol(1, _: Double, _:Double) val sinc2 = convol(2, _: Double, _:Double) val sinc4 = convol(4, _: Double, _:Double)
Tip
Partially applied functions versus partial functions:
Partial functions and partially applied functions are not actually related.
A partial function f is a function that is applied to a subset X' of the input space X. It does not execute for all possible input values:
def f(x: X): PartialFunction[X, Y] = { case x: X if(x belong X') => execute }
A partially applied function f2
is a function value for which the user supplies the value for one or more argument. The projection reduces the dimension of the input space (X
, Z
):
def f(x: X, z: Z): Y val z0: Z = a val f2 = (x: X) => f(x, _: Z)
The class DFTFilter
inherits from the DFT
class in order to reuse the forward transform function fwrd
. The frequency domain function g
is an attribute of the filter. The g
function takes the frequency cut-off value fC
as second argument (line 13). The two filters sinc
and sinc2
defined in the previous section are examples of filtering functions:
class DFTFilter[@specialized(Double) T: ToDouble]( fC: Double, eps: Double) (g: (Double, Double) =>Double) extends DFT[T](eps) { //13 override def |>: PartialFunction[Vector[T], Try[DblVec]] ={ case xt: Vector[T] if( xt.size >= 2 ) => { fwrd(xt).map{ case(trf, freq) => { //14 val cutOff = fC*freq.size val filtered = freq.indices .map{n => freq(n)*g(n, cutOff) } //15 trf.transform(filtered.toArray, INVERSE).toVector }) //16 } } }
The filtering process follows three steps:
- Computation of the discrete Fourier forward transformation (sine or cosine),
fwrd
(line 14). - Apply the filter function (formula M11) through a Scala
map
method (line 15). - Apply the inverse transform on the frequencies (line 16).
Let us evaluate the impact of the cut-off values on the filtered data. The implementation of the test program consists of loading the data from file (line 17), then invoking the DFTFilter
partial function pfnDFTfilter
(line 18):
import YahooFinancials._
val inputFile = s"$RESOURCE_PATH$symbol.csv"
val CUTOFF = 0.005
val pfnDFTfilter = DFTFilter[Double](CUTOFF)(sinc) |>
for {
path <- getPath(inputFile)
src <- DataSource(path, false, true, 1)
price <- src.get(adjClose) //17
filtered <- pfnDFTfilter(price) //18
}yield { /* ... */ }
Filtering the noise out is accomplished by selecting the cutoff value between any of the three harmonics with the respective frequencies of 2, 5, and 15. The original and the two filtered time series are plotted on the following graph:

Plotting of discrete Fourier filter based smoothing
As you would expect, the low-pass filter with a cut-off value of 12 eliminates the noise with the highest frequencies. The filter with the cutoff value 4 cancels out the second harmonic (low-frequency noise), leaving out only the main trend cycle.
Detection of market cycles
Using the discrete Fourier transform to generate the frequencies spectrum of a periodical time series is easy. However, what about real-world signals such as the time series representing the historical price of a stock?
The purpose of the next exercise is to detect, if any, the long-term cycle(s) of the overall stock market by applying the discrete Fourier transform to the quote of the S&P 500 index between January 1, 2009 and December 31, 2013, as illustrated in the following graph:

Historical S&P 500 index prices
The first step is to apply the DFT to extract a frequencies spectrum for the S&P 500 historical prices, as shown in the following graph with the first 32 harmonics:

Frequencies spectrum for historical S&P index
The frequency domain chart highlights some interesting characteristics regarding the S&P 500 historical prices:
- Both positive and negative amplitudes are present, as you would expect in a time series with complex values. The cosine series contributes to the positive amplitudes while the sine series affects both positive and negative amplitudes (cos(x+π) = sin(x)).
- The decay of the amplitude along the frequencies is steep enough to warrant further analysis beyond the first harmonic: the first harmonic represents the main trend of the historical stock price. The next step is to apply a band-pass filter technique to the S&P 500 historical data in order to identify short-term trends with lower periodicity.
A low-pass filter is limited to reduce or cancel out the noise in the raw data. In this case, a band-pass filter using a range or window of frequencies is appropriate to isolate the frequency or the group of frequencies that characterize a specific cycle. The sinc
function introduced in the previous section to implement a low-pass filter is modified to enforce the band-pass within a window [w1, w2] as follows:
def sinc(f: Double, w: (Double, Double)): Double =
if(f > w._1 && f < w._2) 1.0 else 0.0
Let us define a DFT-based band-pass filter with a window of width 4, w=(i, i+4) with i ranging between 2 and 20. Applying the window [4, 8] isolates the impact of the second harmonic on the price. As we eliminate the main upward trend with frequencies less than 4, all filtered data varies within a short range relative to the main trend. The following graph shows the output of the filter:

Output of a band-pass DFT filter range 4-8 on historical S&P index
In this next case, we filter the S&P 500 index around the third group of harmonics with frequencies ranging from 18 to 22; the signal is converted into a familiar sinusoidal function, as shown here:

Output of a band-pass DFT filter range 18-22 on historical S&P index
There is a possible rational explanation for the shape of the S&P 500 data filtered by a band-pass with a frequency 20 as illustrated in the previous plot: the S&P 500 historical data plot shows that the frequency of the fluctuation in the middle of the uptrend (trading sessions 620 to 770) increases significantly.
This phenomenon can be explained by the fact that the S&P 500 reaches a resistance level around the trading session 545 when the existing uptrend breaks. A tug of war starts between the bulls betting the market nudges higher and the bears who are expecting a correction. The back and forth between the traders ends when the S&P 500 breaks through its resistance and resumes a strong uptrend characterized by a high amplitude low frequency, as shown in the following graph:

Illustration of support and resistance levels for the historical S&P 500 index prices
One of the limitations of using the discrete Fourier based filters to clean up data is that it requires the data scientist to extract the frequencies spectrum and modify the filter on a regular basis, as they are never sure that the most recent batch of data does not introduce noise with different frequency. The Kalman filter addresses this limitation.
- Web應用系統開發實踐(C#)
- C語言程序設計(第3版)
- Vue.js 2 and Bootstrap 4 Web Development
- R語言數據可視化之美:專業圖表繪制指南
- JavaScript Unlocked
- INSTANT Weka How-to
- Ray分布式機器學習:利用Ray進行大模型的數據處理、訓練、推理和部署
- 數據結構(C語言)
- Learning Data Mining with R
- 精通MATLAB(第3版)
- Android開發:從0到1 (清華開發者書庫)
- Machine Learning in Java
- FFmpeg開發實戰:從零基礎到短視頻上線
- Python趣味編程與精彩實例
- 會當凌絕頂:Java開發修行實錄