Numerical integration in R part II

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Gaussian Quadrature
    • Integrals of several variables
    • Adaptive methods
    • Monte-Carlo integration

Gaussian quadrature

  • We can see in the last example, we will need to take many points in a partition to get an accurate approximation with the trapezoidal rule.

  • One of the ways to overcome this is to use a more accurate kind of approximation, Gaussian-Quadrature.

  • In R, the function integrate() uses an integration method that is based on Gaussian quadrature (the exact method is called the Gauss–Kronrod quadrature).

  • The Gaussian method uses non-predetermined nodes \( x_1 , \cdots , x_n \) to approximate the integral, so that polynomials of higher order can be integrated more precisely than with using the Newton–Cotes rule.

  • For \( n \) nodes, it uses a polynomial

    \[ p(x) =\sum_{j=1}^{2n} c_j x^{j-1} \]

    of order \( 2n-1 \) in its highest power.

  • We will not go into detail the differences between the Newton-Cotes scheme versus the Gaussian quadrature, but instead we will consider the difference with the last approximation.

Gaussian quadrature

  • The integrate function works differently in which we need to supply a function, a lower and upper bound, and optionally the max-size of the partition – finally we extract the value of the integral as a $ variable from the resulting object.
for (i in 2:2:20) {
  print(abs(2 - integrate(f=cos, lower=(-pi/2), upper=(pi/2), subdivisions=i)$value))
}
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
  • Notice that this is vastly more accurate than the trapezoidal rule.

Gaussian quadrature

  • We can get an estimate of the approximation error from the integrate output as well the number of sub-divisions used
int_cos <- integrate(f=cos, lower=(-pi/2), upper=(pi/2), subdivisions=2)
int_cos$abs.error
[1] 2.220446e-14
int_cos$subdivisions
[1] 1
int_cos <- integrate(f=cos, lower=(-pi/2), upper=(pi/2), subdivisions=20)
int_cos$abs.error
[1] 2.220446e-14
int_cos$subdivisions
[1] 1
  • Note, Gaussian quadrature only needed to use a single sub-division in all the previous cases to obtain error at the order of \( 10^{-14} \).

  • We will explore more of this in activities, reflecting on the relationship again between the density function and the CDF.

Integration in multiple variables

  • Similar to numerical integration in one variable, an integration in multiple variables can be expressed as follows:

    \[ \begin{align} \int_{a_1}^{b_1} \cdots \int_{a_p}^{b_p} f\left(x_1, \cdots, x_p\right)\mathrm{d}x_1 \cdots \mathrm{d}x_p \approx \sum_{i_1=1}^n \cdots \sum_{i_p=1}^n W_{i_1} \cdots W_{i_p} f\left(x_{i_1},\cdots, x_{i_p}\right) \end{align} \] where

    • \( f \) is a function from \( \mathbb{R}^p \rightarrow \mathbb{R} \);
    • We integrate over the domain \( [a_1, b_1]\times \cdots \times [a_p, b_p] \)
    • \( \left\{x_{i_j} \right\}_{i_j=1}^{n} \) are the partition points for the interval \( [a_j, b_j] \); and
    • \( W_{i_j} \) is a weight that is given to the associated sub-partition of the region.
  • The issue with the direct approach as above is that the complexity will grow like \( p^n \), i.e., the dimension to the power of the partition size.

Adaptive methods

  • One better approach computationally is to make an adaptive procedure, where a refinement of the region is chosen based on the tolerated error in the final result.

  • The adaptive method in the context of multiple integrals divides the integration region \( D\in\mathbb{R}^p \) into subregions \( S_j \in \mathbb{R}^p \).

  • For each subregion \( S_j \), specific rules are applied to approximate the integral.

  • Define the error for each sub-region to be denoted by \( E_j \).

    • If the overall error \( \sum_{j=1}^n E_j \) is smaller than a predefined tolerance level, the algorithm stops.
  • However, if this condition is not met, the highest error \( \mathrm{max}_{j}\left(E_j\right) \) is selected and the corresponding region is split into additional subregions.

  • To integrate functions of multiple variables in R, the package cubature can be used, with the method cuhre applying the adaptive scheme.

  • We will consider an example in the following.

Adaptive methods