Title: | Parallel Numerical Derivatives, Gradients, Jacobians, and Hessians of Arbitrary Accuracy Order |
---|---|
Description: | Numerical derivatives through finite-difference approximations can be calculated using the 'pnd' package with parallel capabilities and optimal step-size selection to improve accuracy. These functions facilitate efficient computation of derivatives, gradients, Jacobians, and Hessians, allowing for more evaluations to reduce the mathematical and machine errors. Designed for compatibility with the 'numDeriv' package, which has not received updates in several years, it introduces advanced features such as computing derivatives of arbitrary order, improving the accuracy of Hessian approximations by avoiding repeated differencing, and parallelising slow functions on Windows, Mac, and Linux. |
Authors: | Andreï Victorovitch Kostyrka [aut, cre] |
Maintainer: | Andreï Victorovitch Kostyrka <[email protected]> |
License: | EUPL |
Version: | 0.0.9 |
Built: | 2025-03-11 13:34:56 UTC |
Source: | https://github.com/fifis/pnd |
Number of core checks and changes
checkCores(cores = NULL)
checkCores(cores = NULL)
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
An integer with the number of cores.
checkCores() checkCores(2) suppressWarnings(checkCores(1000))
checkCores() checkCores(2) suppressWarnings(checkCores(1000))
Determine function dimensionality and vectorisation
checkDimensions( FUN, x, f0 = NULL, func = NULL, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1, acc.order = 2, side = 0, h = NULL, report = 1L, zero.tol = sqrt(.Machine$double.eps), cores = 1, preschedule = TRUE, cl = NULL, ... )
checkDimensions( FUN, x, f0 = NULL, func = NULL, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1, acc.order = 2, side = 0, h = NULL, report = 1L, zero.tol = sqrt(.Machine$double.eps), cores = 1, preschedule = TRUE, cl = NULL, ... )
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
func |
For compatibility with |
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
side |
Integer scalar or vector indicating the type of finite difference:
|
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
report |
Integer for the level of detail in the output. If |
zero.tol |
Small positive integer: if |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
... |
Additional arguments passed to |
The following combinations of parameters are allowed, suggesting specific input and output handling by other functions:
elementwise |
!elementwise |
|
!multivalued , vectorised |
FUN(xgrid) |
(undefined) |
!multivalued , !vectorised |
[mc]lapply(xgrid, FUN) |
[mc]lapply gradient |
multivalued , vectorised |
(undefined) | FUN(xgrid) Jacobian |
multivalued , !vectorised |
(undefined) | [mc]lapply Jacobian |
Some combinations are impossible: multi-valued functions cannot be element-wise, and single-valued vectorised functions must element-wise.
In brief, testing the input and output length and vectorisation capabilities may result in five
cases, unlike 3 in numDeriv::grad()
that does not provide checks for Jacobians.
A named logical vector indicating if a function is element-wise or not, vectorised or not, and multivalued or not.
checkDimensions(sin, x = 1:4, h = 1e-5, report = 2) # Rn -> Rn vectorised checkDimensions(function(x) integrate(sin, 0, x)$value, x = 1:4, h = 1e-5, report = 2) # non vec checkDimensions(function(x) sum(sin(x)), x = 1:4, h = 1e-5, report = 2) # Rn -> R, gradient checkDimensions(function(x) c(sin(x), cos(x)), x = 1, h = 1e-5, report = 2) # R -> Rn, Jacobian checkDimensions(function(x) c(sin(x), cos(x)), x = 1:4, h = 1e-5, report = 2) # vec Jac checkDimensions(function(x) c(integrate(sin, 0, x)$value, integrate(sin, -x, 0)$value), x = 1:4, h = 1e-5, report = 2) # non-vectorised Jacobian
checkDimensions(sin, x = 1:4, h = 1e-5, report = 2) # Rn -> Rn vectorised checkDimensions(function(x) integrate(sin, 0, x)$value, x = 1:4, h = 1e-5, report = 2) # non vec checkDimensions(function(x) sum(sin(x)), x = 1:4, h = 1e-5, report = 2) # Rn -> R, gradient checkDimensions(function(x) c(sin(x), cos(x)), x = 1, h = 1e-5, report = 2) # R -> Rn, Jacobian checkDimensions(function(x) c(sin(x), cos(x)), x = 1:4, h = 1e-5, report = 2) # vec Jac checkDimensions(function(x) c(integrate(sin, 0, x)$value, integrate(sin, -x, 0)$value), x = 1:4, h = 1e-5, report = 2) # non-vectorised Jacobian
Repeated indices of the first unique value
dupRowInds(m)
dupRowInds(m)
m |
A matrix or a data frame. This function is an inverse function to such operations as
|
A vector of row indices corresponding to the first ocurrence of a given row.
dupRowInds(mtcars[rep(1:10, 10), rep(1:10, 10)]) dupRowInds(matrix(rnorm(1000), ncol = 10))
dupRowInds(mtcars[rep(1:10, 10), rep(1:10, 10)]) dupRowInds(matrix(rnorm(1000), ncol = 10))
This function computes the coefficients for a numerical approximation to derivatives
of any specified order. It provides the minimally sufficient stencil
for the chosen derivative order and desired accuracy order.
It can also use any user-supplied stencil (uniform or non-uniform).
For that stencil , it computes
the optimal weights
that yield
the numerical approximation of the derivative:
fdCoef( deriv.order = 1L, side = c(0L, 1L, -1L), acc.order = 2L, stencil = NULL, zero.action = c("drop", "round", "none"), zero.tol = NULL )
fdCoef( deriv.order = 1L, side = c(0L, 1L, -1L), acc.order = 2L, stencil = NULL, zero.action = c("drop", "round", "none"), zero.tol = NULL )
deriv.order |
Order of the derivative ( |
side |
Integer that determines the type of finite-difference scheme:
|
acc.order |
Order of accuracy: defines how the approximation error scales
with the step size |
stencil |
Optional custom vector of points for function evaluation.
Must include at least |
zero.action |
Character string specifying how to handle near-zero weights:
|
zero.tol |
Non-negative scalar defining the threshold: weights below
|
This function relies on the approach of approximating numerical derivarives by weghted sums of function values described in (Fornberg 1988). It reproduces all tables from this paper exactly; see the example below to create Table 1.
The finite-difference coefficients for any given stencil are given as a solution of a linear system. The capabilities of this function are similar to those of (Taylor 2016), but instead of matrix inversion, the (Björck and Pereyra 1970) algorithm is used because the left-hand-side matrix is a Vandermonde matrix, and its inverse may be very inaccurate, especially for long one-sided stencils.
The weights computed for the stencil via this algorithm are very reliable; numerical simulations in (Higham 1987) show that the relative error is low even for ill-conditioned systems. (Kostyrka 2025) computes the exact relative error of the weights on the stencils returned by this function; the zero tolerance is based on these calculations.
A list containing the stencil
used and the corresponding
weights
for each point.
Björck Å, Pereyra V (1970).
“Solution of Vandermonde systems of equations.”
Mathematics of computation, 24(112), 893–903.
Fornberg B (1988).
“Generation of Finite Difference Formulas on Arbitrarily Spaced Grids.”
Mathematics of Computation, 51(184), 699–706.
doi:10.1090/S0025-5718-1988-0935077-0.
Higham NJ (1987).
“Error analysis of the Björck-Pereyra algorithms for solving Vandermonde systems.”
Numerische Mathematik, 50(5), 613–632.
Kostyrka AV (2025).
“What are you doing, step size: fast computation of accurate numerical derivatives with finite precision.”
Working paper.
Taylor CR (2016).
“Finite Difference Coefficients Calculator.”
https://web.media.mit.edu/~crtaylor/calculator.html.
fdCoef() # Simple two-sided derivative fdCoef(2) # Simple two-sided second derivative fdCoef(acc.order = 4)$weights * 12 # Should be (1, -8, 8, -1) # Using an custom stencil for the first derivative: x-2h and x+h fdCoef(stencil = c(-2, 1), acc.order = 1) # Reproducing Table 1 from Fornberg (1988) (cited above) pad9 <- function(x) {l <- length(x); c(a <- rep(0, (9-l)/2), x, a)} f <- function(d, a) pad9(fdCoef(deriv.order = d, acc.order = a, zero.action = "round")$weights) t11 <- t(sapply((1:4)*2, function(a) f(d = 1, a))) t12 <- t(sapply((1:4)*2, function(a) f(d = 2, a))) t13 <- t(sapply((1:3)*2, function(a) f(d = 3, a))) t14 <- t(sapply((1:3)*2, function(a) f(d = 4, a))) t11 <- cbind(t11[, 1:4], 0, t11[, 5:8]) t13 <- cbind(t13[, 1:4], 0, t13[, 5:8]) t1 <- data.frame(OrdDer = rep(1:4, times = c(4, 4, 3, 3)), OrdAcc = c((1:4)*2, (1:4)*2, (1:3)*2, (1:3)*2), rbind(t11, t12, t13, t14)) colnames(t1)[3:11] <- as.character(-4:4) print(t1, digits = 4)
fdCoef() # Simple two-sided derivative fdCoef(2) # Simple two-sided second derivative fdCoef(acc.order = 4)$weights * 12 # Should be (1, -8, 8, -1) # Using an custom stencil for the first derivative: x-2h and x+h fdCoef(stencil = c(-2, 1), acc.order = 1) # Reproducing Table 1 from Fornberg (1988) (cited above) pad9 <- function(x) {l <- length(x); c(a <- rep(0, (9-l)/2), x, a)} f <- function(d, a) pad9(fdCoef(deriv.order = d, acc.order = a, zero.action = "round")$weights) t11 <- t(sapply((1:4)*2, function(a) f(d = 1, a))) t12 <- t(sapply((1:4)*2, function(a) f(d = 2, a))) t13 <- t(sapply((1:3)*2, function(a) f(d = 3, a))) t14 <- t(sapply((1:3)*2, function(a) f(d = 4, a))) t11 <- cbind(t11[, 1:4], 0, t11[, 5:8]) t13 <- cbind(t13[, 1:4], 0, t13[, 5:8]) t1 <- data.frame(OrdDer = rep(1:4, times = c(4, 4, 3, 3)), OrdAcc = c((1:4)*2, (1:4)*2, (1:3)*2, (1:3)*2), rbind(t11, t12, t13, t14)) colnames(t1)[3:11] <- as.character(-4:4) print(t1, digits = 4)
Computes numerical derivatives of a scalar or vector function using finite-difference methods.
This function serves as a backbone for Grad()
and Jacobian()
, allowing for detailed control
over the derivative computation process, including order of derivatives, accuracy, and step size.
GenD
is fully vectorised over different coordinates of the function argument,
allowing arbitrary accuracies, sides, and derivative orders for different coordinates.
GenD( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2L, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
GenD( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2L, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
zero.tol |
Small positive integer: if |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
For compatibility with |
report |
Integer for the level of detail in the output. If |
... |
Additional arguments passed to |
For computing gradients and Jacobians, use convenience wrappers Jacobian
and Grad
.
If the step size is too large, the slope of the secant poorly estimates the derivative; if it is too small, it leads to numerical instability due to the function value rounding.
The optimal step size for one-sided differences typically approaches Mach.eps^(1/2)
to balance the Taylor series truncation error with the rounding error due to storing
function values with limited precision. For two-sided differences, it is proportional
to Mach.eps^(1/3). However, selecting the best step size typically requires knowledge
of higher-order derivatives, which may not be readily available. Luckily,
using step = "SW"
invokes a reliable automatic data-driven step-size selection.
Other options include "DV"
, "CR"
, and "CRm"
.
The step size defaults to 1.5e-8
(the square root of machine epsilon)
for x
less than 1.5e-8
,
(2.2e-16)^(1/(deriv.order + acc.order)) * x
for x > 1
, and interpolates
exponentially between these values for 1.5e-8 < x < 1
.
The use of f0
can reduce computation time similar to the use of f.lower
and f.upper
in uniroot()
.
For convenience, report = 2
overrides diagnostics = FALSE
in the
control
) list.
Unlike numDeriv::grad()
and numDeriv::jacobian()
, this function
fully preserves the names of x
and FUN(x)
.
A vector or matrix containing the computed derivatives, structured according
to the dimensionality of x
and FUN
. If FUN
is scalar-valued,
returns a gradient vector. If FUN
is vector-valued, returns a Jacobian matrix.
gradstep()
for automatic step-size selection.
# Case 1: Vector argument, vector output f1 <- sin g1 <- GenD(FUN = f1, x = 1:100) g1.true <- cos(1:100) plot(1:100, g1 - g1.true, main = "Approximation error of d/dx sin(x)") # Case 2: Vector argument, scalar result f2 <- function(x) sum(sin(x)) g2 <- GenD(FUN = f2, x = 1:4) g2.h2 <- Grad(FUN = f2, x = 1:4, h = 7e-6) g2 - g2.h2 # Tiny differences due to different step sizes g2.auto <- Grad(FUN = f2, x = 1:4, h = "SW") g2.full <- Grad(FUN = f2, x = 1:4, h = "SW", report = 2) print(attr(g2.full, "step.search")$exitcode) # Success # Case 3: vector input, vector argument of different length f3 <- function(x) c(sum(sin(x)), prod(cos(x))) x3 <- 1:3 j3 <- GenD(f3, x3, multivalued = TRUE) print(j3) # Gradients for vectorised functions -- e.g. leaky ReLU LReLU <- function(x) ifelse(x > 0, x, 0.01*x) system.time(replicate(10, suppressMessages(GenD(LReLU, runif(30, -1, 1))))) system.time(replicate(10, suppressMessages(GenD(LReLU, runif(30, -1, 1))))) # Saving time for slow functions by using pre-computed values x <- 1:4 finner <- function(x) sin(x*log(abs(x)+1)) fouter <- function(x) integrate(finner, 0, x, rel.tol = 1e-12, abs.tol = 0)$value # The outer function is non-vectorised fslow <- function(x) {Sys.sleep(0.01); fouter(x)} f0 <- sapply(x, fouter) system.time(GenD(fslow, x, side = 1, acc.order = 2, f0 = f0)) # Now, with extra checks, it will be slower system.time(GenD(fslow, x, side = 1, acc.order = 2))
# Case 1: Vector argument, vector output f1 <- sin g1 <- GenD(FUN = f1, x = 1:100) g1.true <- cos(1:100) plot(1:100, g1 - g1.true, main = "Approximation error of d/dx sin(x)") # Case 2: Vector argument, scalar result f2 <- function(x) sum(sin(x)) g2 <- GenD(FUN = f2, x = 1:4) g2.h2 <- Grad(FUN = f2, x = 1:4, h = 7e-6) g2 - g2.h2 # Tiny differences due to different step sizes g2.auto <- Grad(FUN = f2, x = 1:4, h = "SW") g2.full <- Grad(FUN = f2, x = 1:4, h = "SW", report = 2) print(attr(g2.full, "step.search")$exitcode) # Success # Case 3: vector input, vector argument of different length f3 <- function(x) c(sum(sin(x)), prod(cos(x))) x3 <- 1:3 j3 <- GenD(f3, x3, multivalued = TRUE) print(j3) # Gradients for vectorised functions -- e.g. leaky ReLU LReLU <- function(x) ifelse(x > 0, x, 0.01*x) system.time(replicate(10, suppressMessages(GenD(LReLU, runif(30, -1, 1))))) system.time(replicate(10, suppressMessages(GenD(LReLU, runif(30, -1, 1))))) # Saving time for slow functions by using pre-computed values x <- 1:4 finner <- function(x) sin(x*log(abs(x)+1)) fouter <- function(x) integrate(finner, 0, x, rel.tol = 1e-12, abs.tol = 0)$value # The outer function is non-vectorised fslow <- function(x) {Sys.sleep(0.01); fouter(x)} f0 <- sapply(x, fouter) system.time(GenD(fslow, x, side = 1, acc.order = 2, f0 = f0)) # Now, with extra checks, it will be slower system.time(GenD(fslow, x, side = 1, acc.order = 2))
Create a grid of points for a gradient / Jacobian
generateGrid(x, h, stencils, elementwise, vectorised)
generateGrid(x, h, stencils, elementwise, vectorised)
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
stencils |
A list of outputs from |
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
A list with points for evaluation, summation weights for derivative computation, and indices for combining values.
generateGrid(1:4, h = 1e-5, elementwise = TRUE, vectorised = TRUE, stencils = lapply(1:4, function(a) fdCoef(acc.order = a)))
generateGrid(1:4, h = 1e-5, elementwise = TRUE, vectorised = TRUE, stencils = lapply(1:4, function(a) fdCoef(acc.order = a)))
Creates a list of unique evaluation points for second derivatives: both
diagonal () and cross
(
).
generateGrid2(x, side, acc.order, h)
generateGrid2(x, side, acc.order, h)
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
A list with elements:
xlist
: a list of unique coordinate shifts,
w
: the finite-difference weights (one per point),
i1
, i2
: integer vectors giving partial-derivative indices.
The length of each vector matches xlist
.
generateGrid2(1:4, side = rep(0, 4), acc.order = c(2, 6, 4, 2), h = c(1e-5, 1e-4, 2e-5, 1e-6))
generateGrid2(1:4, side = rep(0, 4), acc.order = c(2, 6, 4, 2), h = c(1e-5, 1e-4, 2e-5, 1e-6))
Computes numerical derivatives and gradients of scalar-valued functions using
finite differences. This function supports both two-sided (central, symmetric) and
one-sided (forward or backward) derivatives. It can utilise parallel processing
to accelerate computation of gradients for slow functions or
to attain higher accuracy faster. Currently, only Mac and Linux are supported
parallel::mclapply()
. Windows support with parallel::parLapply()
is under development.
Grad( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
Grad( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
zero.tol |
Small positive integer: if |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
For compatibility with |
report |
Integer for the level of detail in the output. If |
... |
Additional arguments passed to |
This function aims to be 100% compatible with the syntax of numDeriv::Grad()
.
There is one feature of the default step size in numDeriv
that deserves
an explanation.
If method = "simple"
, then, simple forward differences are used with
a fixed step size eps
, which we denote by .
If method = "Richardson"
, then, central differences are used with
a fixed step
,
where
d = 1e-4
is the relative step size and eps
becomes an extra
addition to the step size for the argument that are closer to zero than zero.tol
.
We believe that the latter may lead to mistakes when the user believes that they can set
the step size for near-zero arguments, whereas in reality, a combination of d
and eps
is used.
Here is the synopsis of the old arguments:
numDeriv
uses NA
for handling two-sided differences.
The pnd
equivalent is 0
, and NA
is replaced with 0
.
If numDeriv
method = "simple"
, then, eps = 1e-4
is
the absolute step size and forward differences are used.
If method = "Richardson"
, then, eps = 1e-4
is the absolute increment of the step
size for small arguments below the zero tolerance.
If numDeriv
method = "Richardson"
, then, d*abs(x)
is the
step size for arguments above the zero tolerance and the baseline step size for
small arguments that gets incremented by eps
.
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
The reduction factor in Richardson extrapolations.
Here are the differences in the new compatible implementation.
If numDeriv
method = "simple"
, then,
ifelse(x!=0, abs(x), 1) * sqrt(.Machine$double.eps) * 2
is used because
one-sided differences require a smaller step size to reduce the truncation error.
If method = "Richardson"
, then, eps = 1e-5
.
If numDeriv
method = "Richardson"
, then, d*abs(x)
is the
step size for arguments above the zero tolerance and the baseline step size for
small arguments that gets incremented by eps
.
The number of Richardson extrapolations that successively reduce the initial step size. For two-sided differences, each extrapolation increases the accuracy order by 2.
The reduction factor in Richardson extrapolations.
Grad
does an initial check (if f0 = FUN(x)
is not provided)
and calls GenD()
with a set of appropriate parameters (multivalued = FALSE
if the check succeds). In case of parameter mismatch, throws and error.
Numeric vector of the gradient. If FUN
returns a vector,
a warning is issued suggesting the use of Jacobian()
.
f <- function(x) sum(sin(x)) g1 <- Grad(FUN = f, x = 1:4) g2 <- Grad(FUN = f, x = 1:4, h = 7e-6) g2 - g1 # Tiny differences due to different step sizes g.auto <- Grad(FUN = f, x = 1:4, h = "SW") g3.full <- Grad(FUN = f, x = 1:4, h = "SW", report = 2) print(g3.full) attr(g3.full, "step.search")$exitcode # Success # Gradients for vectorised functions -- e.g. leaky ReLU LReLU <- function(x) ifelse(x > 0, x, 0.01*x) Grad(LReLU, seq(-1, 1, 0.1))
f <- function(x) sum(sin(x)) g1 <- Grad(FUN = f, x = 1:4) g2 <- Grad(FUN = f, x = 1:4, h = 7e-6) g2 - g1 # Tiny differences due to different step sizes g.auto <- Grad(FUN = f, x = 1:4, h = "SW") g3.full <- Grad(FUN = f, x = 1:4, h = "SW", report = 2) print(g3.full) attr(g3.full, "step.search")$exitcode # Success # Gradients for vectorised functions -- e.g. leaky ReLU LReLU <- function(x) ifelse(x > 0, x, 0.01*x) Grad(LReLU, seq(-1, 1, 0.1))
Automatic step selection for numerical derivatives
gradstep( FUN, x, h0 = NULL, zero.tol = sqrt(.Machine$double.eps), method = c("plugin", "SW", "CR", "CRm", "DV", "M"), diagnostics = FALSE, control = NULL, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, ... )
gradstep( FUN, x, h0 = NULL, zero.tol = sqrt(.Machine$double.eps), method = c("plugin", "SW", "CR", "CRm", "DV", "M"), diagnostics = FALSE, control = NULL, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric vector or scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric vector or scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
zero.tol |
Small positive integer: if |
method |
Character indicating the method: |
diagnostics |
Logical: if |
control |
A named list of tuning parameters for the method. If |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
... |
Passed to FUN. |
We recommend using the Mathur algorithm because it does not suffer from over-estimation of the truncation error in the Curtis–Reid approach and from sensitivity to near-zero third derivatives in the Dumontet–Vignes approach. It really tries muliple step sizes simultaneously and handles missing values due to bad evaluations for inadequate step sizes really in a robust manner.
A list similar to the one returned by optim()
and made of
concatenated individual elements coordinate-wise lists: par
– the optimal
step sizes found, value
– the estimated numerical gradient,
counts
– the number of iterations for each coordinate,
abs.error
– an estimate of the total approximation error
(sum of truncation and rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates optimal termination within tolerance,
1
means that the truncation error (CR method) or the third derivative
(DV method) is zero and large step size is preferred,
2
is returned if there is no change in step size within tolerance,
3
indicates a solution at the boundary of the allowed value range,
4
signals that the maximum number of iterations was reached.
message
– summary messages of the exit status.
If method.ards$diagnostics
is TRUE
, iterations
is a list of lists
including the full step size search path, argument grids, function values on
those grids, estimated error ratios, and estimated derivative values for
each coordinate.
Curtis AR, Reid JK (1974).
“The Choice of Step Lengths When Using Differences to Approximate Jacobian Matrices.”
IMA Journal of Applied Mathematics, 13(1), 121–126.
doi:10.1093/imamat/13.1.121.
Dumontet J, Vignes J (1977).
“Détermination du pas optimal dans le calcul des dérivées sur ordinateur.”
RAIRO. Analyse numérique, 11(1), 13–25.
doi:10.1051/m2an/1977110100131.
Mathur R (2012).
An Analytical Approach to Computing Step Sizes for Finite-Difference Derivatives.
Ph.D. thesis, University of Texas at Austin.
http://hdl.handle.net/2152/ETD-UT-2012-05-5275.
Stepleman RS, Winarsky ND (1979).
“Adaptive numerical differentiation.”
Mathematics of Computation, 33(148), 1257–1264.
doi:10.1090/s0025-5718-1979-0537969-8.
step.CR()
for Curtis–Reid (1974) and its modification,
step.DV()
for Dumontet–Vignes (1977),
step.SW()
for Stepleman–Winarsky (1979), and
step.M()
for Mathur (2012).
gradstep(x = 1, FUN = sin, method = "CR") gradstep(x = 1, FUN = sin, method = "CRm") gradstep(x = 1, FUN = sin, method = "DV") gradstep(x = 1, FUN = sin, method = "SW") gradstep(x = 1, FUN = sin, method = "M") # Works for gradients gradstep(x = 1:4, FUN = function(x) sum(sin(x)))
gradstep(x = 1, FUN = sin, method = "CR") gradstep(x = 1, FUN = sin, method = "CRm") gradstep(x = 1, FUN = sin, method = "DV") gradstep(x = 1, FUN = sin, method = "SW") gradstep(x = 1, FUN = sin, method = "M") # Works for gradients gradstep(x = 1:4, FUN = function(x) sum(sin(x)))
Computes the second derivatives of a function with respect to all combinations of its input coordinates. Arbitrary accuracies and sides for different coordinates of the argument vector are supported.
Hessian( FUN, x, side = 0, acc.order = 2, h = NULL, h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
Hessian( FUN, x, side = 0, acc.order = 2, h = NULL, h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
FUN |
A function returning a numeric scalar.
If the function returns a vector, the output will be is a Jacobian.
If instead of |
x |
Numeric vector or scalar: point at which the derivative is estimated.
|
side |
Integer scalar or vector indicating difference type:
|
acc.order |
Integer specifying the desired accuracy order.
The error typically scales as |
h |
Numeric scalar, vector, or character specifying the step size for the numerical
difference. If character ( |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric scalar or vector: if provided and applicable, used
where the stencil contains zero (i.e. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
Deprecated; for |
report |
Integer: if |
... |
Additional arguments passed to |
The optimal step size for 2nd-order-accurate central-differences-based Hessians is of the order Mach.eps^(1/4) to balance the Taylor series truncation error with the rounding error. However, selecting the best step size typically requires knowledge of higher-order cross derivatives and is highly technically involved. Future releases will allow character arguments to invoke automatic data-driven step-size selection.
The use of f0
can reduce computation time similar to the use of f.lower
and f.upper
in uniroot()
.
Some numerical packages use the option (or even the default behaviour) of computing
not only the i < j
cross-partials for the Hessian, but all pairs of i
and j
. The upper and lower triangular matrices are filled, and the matrix is
averaged with its transpose to obtain a Hessian – this is the behaviour of optimHess()
.
However, it can be shown that H[i, j]
and H[j, i]
use the same evaluation
grid, and with a single parallelisable evaluation of the function on that grid, no
symmetrisation is necessary because the result is mathematically and computationally identical.
In pnd
, only the upper triangular matrix is computed, saving time and ensuring
unambiguous results owing to the interchangeability of summation terms (ignoring the numerical
error in summation as there is nothing that can be done apart from compensation summation, e.g.
via Kahan's algorithm).
A matrix with as many rows and columns as length(x)
. Unlike the output of
numDeriv::hessian()
, this output preserves the names of x
.
Grad()
for gradients, GenD()
for generalised numerical differences.
f <- function(x) prod(sin(x)) Hessian(f, 1:4) # Large matrices system.time(Hessian(f, 1:100))
f <- function(x) prod(sin(x)) Hessian(f, 1:4) # Large matrices system.time(Hessian(f, 1:100))
parallel::mclapply()
. Windows support with parallel::parLapply()
is under development.Jacobian matrix computation with parallel capabilities
s
Computes the numerical Jacobian for vector-valued functions. Its columns are
partial derivatives of the function with respect to the input elements.
This function supports both two-sided (central, symmetric) and
one-sided (forward or backward) derivatives. It can utilise parallel processing
to accelerate computation of gradients for slow functions or
to attain higher accuracy faster. Currently, only Mac and Linux are supported
parallel::mclapply()
. Windows support with parallel::parLapply()
is under development.
Jacobian( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
Jacobian( FUN, x, elementwise = NA, vectorised = NA, multivalued = NA, deriv.order = 1L, side = 0, acc.order = 2, h = NULL, zero.tol = sqrt(.Machine$double.eps), h0 = NULL, control = list(), f0 = NULL, cores = 1, preschedule = TRUE, cl = NULL, func = NULL, report = 1L, ... )
FUN |
A function returning a numeric scalar or a vector whose derivatives are to be computed. If the function returns a vector, the output will be a Jacobian. |
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
elementwise |
Logical: is the domain effectively 1D, i.e. is this a mapping
|
vectorised |
Logical: if |
multivalued |
Logical: if |
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
side |
Integer scalar or vector indicating the type of finite difference:
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
h |
Numeric or character specifying the step size(s) for the numerical
difference or a method of automatic step determination ( |
zero.tol |
Small positive integer: if |
h0 |
Numeric scalar of vector: initial step size for automatic search with
|
control |
A named list of tuning parameters passed to |
f0 |
Optional numeric: if provided, used to determine the vectorisation type to save time. If FUN(x) must be evaluated (e.g. second derivatives), saves one evaluation. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
func |
For compatibility with |
report |
Integer for the level of detail in the output. If |
... |
Additional arguments passed to |
Matrix where each row corresponds to a function output and each column to an input coordinate. For scalar-valued functions, a warning is issued and the output is returned as a row matrix.
slowFun <- function(x) {Sys.sleep(0.01); sum(sin(x))} slowFunVec <- function(x) {Sys.sleep(0.01); c(sin = sum(sin(x)), exp = sum(exp(x)))} true.g <- cos(1:4) # Analytical gradient true.j <- rbind(cos(1:4), exp(1:4)) # Analytical Jacobian x0 <- c(each = 1, par = 2, is = 3, named = 4) # Compare computation times system.time(g.slow <- numDeriv::grad(slowFun, x = x0) - true.g) system.time(j.slow <- numDeriv::jacobian(slowFunVec, x = x0) - true.j) system.time(g.fast <- Grad(slowFun, x = x0, cores = 2) - true.g) system.time(j.fast <- Jacobian(slowFunVec, x = x0, cores = 2) - true.j) system.time(j.fast4 <- Jacobian(slowFunVec, x = x0, acc.order = 4, cores = 2) - true.j) # Compare accuracy rownames(j.slow) <- paste0("numDeriv.jac.", c("sin", "exp")) rownames(j.fast) <- paste0("pnd.jac.order2.", rownames(j.fast)) rownames(j.fast4) <- paste0("pnd.jac.order4.", rownames(j.fast4)) # Discrepancy print(rbind(numDeriv.grad = g.slow, pnd.Grad = g.fast, j.slow, j.fast, j.fast4), 2) # The order-4 derivative is more accurate for functions # with non-zero third and higher derivatives -- look at pnd.jac.order.4
slowFun <- function(x) {Sys.sleep(0.01); sum(sin(x))} slowFunVec <- function(x) {Sys.sleep(0.01); c(sin = sum(sin(x)), exp = sum(exp(x)))} true.g <- cos(1:4) # Analytical gradient true.j <- rbind(cos(1:4), exp(1:4)) # Analytical Jacobian x0 <- c(each = 1, par = 2, is = 3, named = 4) # Compare computation times system.time(g.slow <- numDeriv::grad(slowFun, x = x0) - true.g) system.time(j.slow <- numDeriv::jacobian(slowFunVec, x = x0) - true.j) system.time(g.fast <- Grad(slowFun, x = x0, cores = 2) - true.g) system.time(j.fast <- Jacobian(slowFunVec, x = x0, cores = 2) - true.j) system.time(j.fast4 <- Jacobian(slowFunVec, x = x0, acc.order = 4, cores = 2) - true.j) # Compare accuracy rownames(j.slow) <- paste0("numDeriv.jac.", c("sin", "exp")) rownames(j.fast) <- paste0("pnd.jac.order2.", rownames(j.fast)) rownames(j.fast4) <- paste0("pnd.jac.order4.", rownames(j.fast4)) # Discrepancy print(rbind(numDeriv.grad = g.slow, pnd.Grad = g.fast, j.slow, j.fast, j.fast4), 2) # The order-4 derivative is more accurate for functions # with non-zero third and higher derivatives -- look at pnd.jac.order.4
Visualises the estimated truncation error, rounding error, and total error used in automatic step-size selection for numerical differentiation. The plot follows the approach used in Mathur (2012) and other step-selection methods.
plotTE( hgrid, etrunc, eround, hopt = NULL, i.increasing = NULL, i.good = NULL, i.okay = NULL, eps = .Machine$double.eps/2, delta = .Machine$double.eps/2, ... )
plotTE( hgrid, etrunc, eround, hopt = NULL, i.increasing = NULL, i.good = NULL, i.okay = NULL, eps = .Machine$double.eps/2, delta = .Machine$double.eps/2, ... )
hgrid |
Numeric vector: a sequence of step sizes used as the horizontal positions (usually exponentially spaced). |
etrunc |
Numeric vector: estimated truncation error at each step size. This is typically computed by subtracting a more accurate finite-difference approximation from a less accurate one. |
eround |
Numeric vector: estimated rounding error at each step size; usually the best guess or the upper bound is used. |
hopt |
Numeric scalar (optional): selected optimal step size. If provided, a vertical line is drawn at this value. |
i.increasing |
Integer vector (optional): indices of step sizes where the truncation error is increasing, which indicates the search range. |
i.good |
Integer vector (optional): indices of step sizes where the truncation error follows the expected reduction (slope ~ accuracy order; 2 for central differences). |
i.okay |
Integer vector (optional): indices where the truncation error is acceptable but slightly deviates from the expected behaviour. |
eps |
Numeric scalar: condition error, i.e. the error bound for the accuracy of the evaluated function; used for labelling rounding error assumptions. |
delta |
Numeric scalar: subtraction cancellation error, used for labelling rounding error assumptions. |
... |
Additional graphical parameters passed to |
Nothing (invisible null).
hgrid <- 10^seq(-8, 3, 0.25) plotTE(hgrid, etrunc = 2e-12 * hgrid^2 + 1e-14 / hgrid, eround = 1e-14 / hgrid, hopt = 0.4, i.increasing = 30:45, i.good = 32:45)
hgrid <- 10^seq(-8, 3, 0.25) plotTE(hgrid, etrunc = 2e-12 * hgrid^2 + 1e-14 / hgrid, eround = 1e-14 / hgrid, hopt = 0.4, i.increasing = 30:45, i.good = 32:45)
Run a function in parallel over a list (internal use only)
runParallel(FUN, x, cores = 1L, cl = NULL, preschedule = FALSE)
runParallel(FUN, x, cores = 1L, cl = NULL, preschedule = FALSE)
FUN |
A function of only one argument. If there are more arguments, use
the |
x |
A list to parallelise the evaluation of |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
cl |
An optional user-supplied |
preschedule |
Logical: if |
The value that lapply(x, FUN)
would have returned.
fslow <- function(x) Sys.sleep(x) x <- rep(0.05, 6) cl <- parallel::makeCluster(2) print(t1 <- system.time(runParallel(fslow, x))) print(t2 <- system.time(runParallel(fslow, x, cl = cl))) print(t3 <- system.time(runParallel(fslow, x, cores = 2))) parallel::stopCluster(cl) cat("Parallel overhead at 2 cores: ", round(t2[3]*200/t1[3]-100), "%\n", sep = "") # Ignore on Windows cat("makeCluster() overhead at 2 cores: ", round(100*t2[3]/t3[3]-100), "%\n", sep = "")
fslow <- function(x) Sys.sleep(x) x <- rep(0.05, 6) cl <- parallel::makeCluster(2) print(t1 <- system.time(runParallel(fslow, x))) print(t2 <- system.time(runParallel(fslow, x, cl = cl))) print(t3 <- system.time(runParallel(fslow, x, cores = 2))) parallel::stopCluster(cl) cat("Parallel overhead at 2 cores: ", round(t2[3]*200/t1[3]-100), "%\n", sep = "") # Ignore on Windows cat("makeCluster() overhead at 2 cores: ", round(100*t2[3]/t3[3]-100), "%\n", sep = "")
Numerically stable non-confluent Vandermonde system solver
solveVandermonde(s, b)
solveVandermonde(s, b)
s |
Numeric vector of stencil points defining the Vandermonde matrix on
the left-hand side, where each element |
b |
Numeric vector of the right-hand side of the equation.
This vector must be the same length as |
This function utilises the (Björck and Pereyra 1970) algorithm for an accurate solution to non-confluent Vandermonde systems, which are known for their numerical instability. Unlike Gaussian elimination, which suffers from ill conditioning, this algorithm achieves numerical stability through exploiting the ordering of the stencil. An unsorted stencils will trigger a warning. Additionally, the stencil must contain unique points, as repeated values make the Vandermonde matrix confluent and therefore non-invertible.
This implementation is a verbatim translation of Algorithm 4.6.2 from (Golub and Van Loan 2013), which is robust against the issues typically associated with Vandermonde systems.
See (Higham 1987) for an in-depth error analysis of this algorithm.
A numeric vector of coefficients solving the Vandermonde system,
matching the length of s
.
Björck Å, Pereyra V (1970).
“Solution of Vandermonde systems of equations.”
Mathematics of computation, 24(112), 893–903.
Golub GH, Van Loan CF (2013).
Matrix computations, 4 edition.
Johns Hopkins University Press.
Higham NJ (1987).
“Error analysis of the Björck-Pereyra algorithms for solving Vandermonde systems.”
Numerische Mathematik, 50(5), 613–632.
# Approximate the 4th derivatives on a non-negative stencil solveVandermonde(s = 0:5, b = c(0, 0, 0, 0, 24, 0)) # Small numerical inaccuracies: note the 6.66e-15 in the 4th position -- # it should be rounded towards zero: solveVandermonde(s = -3:3, b = c(0, 1, rep(0, 5))) * 60
# Approximate the 4th derivatives on a non-negative stencil solveVandermonde(s = 0:5, b = c(0, 0, 0, 0, 24, 0)) # Small numerical inaccuracies: note the 6.66e-15 in the 4th position -- # it should be rounded towards zero: solveVandermonde(s = -3:3, b = c(0, 1, rep(0, 5))) * 60
Curtis–Reid automatic step selection
step.CR( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), version = c("original", "modified"), aim = if (version[1] == "original") 100 else 1, acc.order = c(2L, 4L), tol = if (version[1] == "original") 10 else 4, range = h0/c(1e+05, 1e-05), maxit = 20L, seq.tol = 1e-04, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
step.CR( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), version = c("original", "modified"), aim = if (version[1] == "original") 100 else 1, acc.order = c(2L, 4L), tol = if (version[1] == "original") 10 else 4, range = h0/c(1e+05, 1e-05), maxit = 20L, seq.tol = 1e-04, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
version |
Character scalar: |
aim |
Positive real scalar: desired ratio of truncation-to-rounding error. The |
acc.order |
Numeric scalar: in the modified version, allows searching for a step size that would be optimal for a 4th-order-accurate central difference See the Details section below. |
tol |
Numeric scalar greater than 1: tolerance multiplier for determining when to stop
the algorithm based on the current estimate being between |
range |
Numeric vector of length 2 defining the valid search range for the step size. |
maxit |
Integer: maximum number of algorithm iterations to prevent infinite loops in degenerate cases. |
seq.tol |
Numeric scalar: maximum relative difference between old and new step sizes for declaring convergence. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
diagnostics |
Logical: if |
... |
Passed to |
This function computes the optimal step size for central differences using the (Curtis and Reid 1974) algorithm. If the estimated third derivative is exactly zero, then, the initial step size is multiplied by 4 and returned.
If 4th-order accuracy (4OA) is requested, then, two things happen. Firstly, since 4OA differences requires a larger step size and the truncation error for the 2OA differences grows if the step size is larger than the optimal one, a higher ratio of truncation-to-rounding errors should be targeted. Secondly, a 4OA numerical derivative is returned, but the truncation and rounding errors are still estimated for the 2OA differences. Therefore, the estimating truncation error is higher and the real truncation error of 4OA differences is lower.
TODO: mention that f must be one-dimensional
The arguments passed to ...
must not partially match those of step.CR()
. For example, if
cl
exists, then, attempting to avoid cluster export by using
step.CR(f, x, h = 1e-4, cl = cl, a = a)
will result in an error: a
matches aim
and acc.order
. Redefine the function for this argument to have a name that is not equal
to the beginning of one of the arguments of step.CR()
.
A list similar to the one returned by optim()
: par
– the optimal
step size found, value
– the estimated numerical first derivative (central differences;
very useful for computationally expensive functions), counts
– the number of
iterations (each iteration includes three function evaluations), abs.error
–
an estimate of the total approximation error (sum of truncation and rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates optimal termination within tolerance,
1
means that the third derivative is zero (large step size preferred),
2
is returned if there is no change in step size within tolerance,
3
indicates a solution at the boundary of the allowed value range,
4
signals that the maximum number of iterations was reached.
message
– a summary message of the exit status.
If diagnostics
is TRUE
, iterations
is a list
including the full step size search path, argument grids, function values on
those grids, estimated error ratios, and estimated derivative values.
Curtis AR, Reid JK (1974). “The Choice of Step Lengths When Using Differences to Approximate Jacobian Matrices.” IMA Journal of Applied Mathematics, 13(1), 121–126. doi:10.1093/imamat/13.1.121.
f <- function(x) x^4 step.CR(x = 2, f) step.CR(x = 2, f, h0 = 1e-3, diagnostics = TRUE) step.CR(x = 2, f, version = "modified") step.CR(x = 2, f, version = "modified", acc.order = 4) # A bad start: too far away step.CR(x = 2, f, h0 = 1000) # Bad exit code + a suggestion to extend the range step.CR(x = 2, f, h0 = 1000, range = c(1e-10, 1e5)) # Problem solved library(parallel) cl <- makePSOCKcluster(names = 2, outfile = "") abc <- 2 f <- function(x, abc) {Sys.sleep(0.02); abc*sin(x)} x <- pi/4 system.time(step.CR(f, x, h = 1e-4, cores = 1, abc = abc)) # To remove speed-ups system.time(step.CR(f, x, h = 1e-4, cores = 2, abc = abc)) # Faster f2 <- function(x) f(x, abc) clusterExport(cl, c("f2", "f", "abc")) system.time(step.CR(f2, x, h = 1e-4, cl = cl)) # Also fast stopCluster(cl)
f <- function(x) x^4 step.CR(x = 2, f) step.CR(x = 2, f, h0 = 1e-3, diagnostics = TRUE) step.CR(x = 2, f, version = "modified") step.CR(x = 2, f, version = "modified", acc.order = 4) # A bad start: too far away step.CR(x = 2, f, h0 = 1000) # Bad exit code + a suggestion to extend the range step.CR(x = 2, f, h0 = 1000, range = c(1e-10, 1e5)) # Problem solved library(parallel) cl <- makePSOCKcluster(names = 2, outfile = "") abc <- 2 f <- function(x, abc) {Sys.sleep(0.02); abc*sin(x)} x <- pi/4 system.time(step.CR(f, x, h = 1e-4, cores = 1, abc = abc)) # To remove speed-ups system.time(step.CR(f, x, h = 1e-4, cores = 2, abc = abc)) # Faster f2 <- function(x) f(x, abc) clusterExport(cl, c("f2", "f", "abc")) system.time(step.CR(f2, x, h = 1e-4, cl = cl)) # Also fast stopCluster(cl)
Dumontet–Vignes automatic step selection
step.DV( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), range = h0/c(1e+06, 1e-06), alpha = 4/3, ratio.limits = c(1/15, 1/2, 2, 15), maxit = 40L, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
step.DV( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), range = h0/c(1e+06, 1e-06), alpha = 4/3, ratio.limits = c(1/15, 1/2, 2, 15), maxit = 40L, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
range |
Numeric vector of length 2 defining the valid search range for the step size. |
alpha |
Numeric scalar >= 1 indicating the relative reduction in the
number of accurate bits due to the calculation of |
ratio.limits |
Numeric vector of length 4 defining the acceptable ranges for step size: the algorithm stops if the relative perturbation of the third derivative by amplified rounding errors falls either between the 1st and 2nd elements or between the 3rd and 4th elements. |
maxit |
Maximum number of algorithm iterations to avoid infinite loops in cases
the desired relative perturbation factor cannot be achieved within the given |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
diagnostics |
Logical: if |
... |
Passed to FUN. |
This function computes the optimal step size for central differences using the (Dumontet and Vignes 1977) algorithm. If the estimated third derivative is exactly zero, the function assumes a third derivative of 1 to prevent division-by-zero errors.
A list similar to the one returned by optim()
: par
– the optimal
step size found, value
– the estimated numerical first derivative (central
differences), counts
– the number of iterations (each iteration includes
four function evaluations), abs.error
– an estimate of the total
approximation error (sum of truncation and rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates optimal termination within tolerance,
1
means that the third derivative is zero (large step size preferred),
3
indicates a solution at the boundary of the allowed value range,
4
signals that the maximum number of iterations was reached and the
found optimal step size belongs to the allowed range,
5
occurs when the maximum number of iterations was reached and the
found optimal step size did belong to the allowed range and had to be snapped
to one end.
6
is used when maxit = 1
and no search was performed.
message
is a summary message of the exit status.
If diagnostics
is TRUE
, iterations
is a list
including the full step size search path (NB: for the 3rd derivative),
argument grids, function values on those grids, and estimated 3rd derivative values.
Dumontet J, Vignes J (1977). “Détermination du pas optimal dans le calcul des dérivées sur ordinateur.” RAIRO. Analyse numérique, 11(1), 13–25. doi:10.1051/m2an/1977110100131.
f <- function(x) x^4 step.DV(x = 2, f) step.DV(x = 2, f, h0 = 1e-3, diagnostics = TRUE) # Plug-in estimator with only one evaluation of f''' step.DV(x = 2, f, maxit = 1)
f <- function(x) x^4 step.DV(x = 2, f) step.DV(x = 2, f, h0 = 1e-3, diagnostics = TRUE) # Plug-in estimator with only one evaluation of f''' step.DV(x = 2, f, maxit = 1)
Mathur's AutoDX-like automatic step selection
step.M( FUN, x, h0 = NULL, range = NULL, shrink.factor = 0.5, min.valid.slopes = 5L, seq.tol = 0.01, correction = TRUE, diagnostics = FALSE, plot = FALSE, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, ... )
step.M( FUN, x, h0 = NULL, range = NULL, shrink.factor = 0.5, min.valid.slopes = 5L, seq.tol = 0.01, correction = TRUE, diagnostics = FALSE, plot = FALSE, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
range |
Numeric vector of length 2 defining the valid search range for the step size. |
shrink.factor |
A scalar less than 1 that is used to create a sequence of step sizes. The recommended value is 0.5. Change to 0.25 for a faster search. This number should be a negative power of 2 for the most accurate representation. |
min.valid.slopes |
Positive integer: how many points must form a sequence
with the correct slope with relative difference from 2 less than |
seq.tol |
Numeric scalar: maximum relative difference between old and new step sizes for declaring convergence. |
correction |
Logical: if |
diagnostics |
Logical: if |
plot |
Logical: if |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
... |
Passed to FUN. |
This function computes the optimal step size for central differences using the (Mathur 2012) algorithm.
A list similar to the one returned by optim()
: par
– the optimal
step size found, value
– the estimated numerical first derivative (central
differences), counts
– the number of iterations (each iteration includes
two function evaluations), abs.error
– an estimate of the total
approximation error (sum of truncation and rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates optimal termination due to a sequence of correct reductions,
1
indicates that the reductions are slightly not within tolerance,
2
indicates that the tolerances are so wrong, an approximate minimum is returned,
3
signals that there are not enough finite function values and the rule of thumb is returned.
message
is a summary message of the exit status.
If diagnostics
is TRUE
, iterations
is a list
including the full step size search path,
argument grids, function values on those grids, estimated derivative values,
estimated error values, and monotonicity check results.
Mathur R (2012). An Analytical Approach to Computing Step Sizes for Finite-Difference Derivatives. Ph.D. thesis, University of Texas at Austin. http://hdl.handle.net/2152/ETD-UT-2012-05-5275.
f <- function(x) x^4 # The derivative at 1 is 4 step.M(x = 1, f, plot = TRUE) step.M(x = 1, f, h0 = 1e-9) # Starting low step.M(x = 1, f, h0 = 1000) # Starting high f <- sin # The derivative at pi/4 is sqrt(2)/2 step.M(x = pi/2, f, plot = TRUE) # Bad case -- TODO a fix step.M(x = pi/4, f, plot = TRUE) step.M(x = pi/4, f, h0 = 1e-9) # Starting low step.M(x = pi/4, f, h0 = 1000) # Starting high # where the truncation error estimate is invalid
f <- function(x) x^4 # The derivative at 1 is 4 step.M(x = 1, f, plot = TRUE) step.M(x = 1, f, h0 = 1e-9) # Starting low step.M(x = 1, f, h0 = 1000) # Starting high f <- sin # The derivative at pi/4 is sqrt(2)/2 step.M(x = pi/2, f, plot = TRUE) # Bad case -- TODO a fix step.M(x = pi/4, f, plot = TRUE) step.M(x = pi/4, f, h0 = 1e-9) # Starting low step.M(x = pi/4, f, h0 = 1000) # Starting high # where the truncation error estimate is invalid
Plug-in step selection
step.plugin( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), range = h0/c(10000, 1e-04), cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
step.plugin( FUN, x, h0 = 1e-05 * max(abs(x), sqrt(.Machine$double.eps)), range = h0/c(10000, 1e-04), cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
range |
Numeric vector of length 2 defining the valid search range for the step size. |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
diagnostics |
Logical: if |
... |
Passed to FUN. |
This function computes the optimal step size for central differences using the plug-in approach. The optimal step size is determined as the minimiser of the total error, which for central finite differences is (assuming minimal bounds for relative rounding errors)
If the estimated third derivative is too small, the function assumes a third derivative of 1 to prevent division-by-zero errors.
A list similar to the one returned by optim()
: par
– the optimal
step size found, value
– the estimated numerical first derivative (central
differences), counts
– the number of iterations (here, it is 2),
abs.error
– an estimate of the total approximation error (sum of truncation and
rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates termination with checks passed tolerance,
1
means that the third derivative is exactly zero (large step size preferred),
2
signals that the third derivative is too close to zero (large step size preferred),
3
indicates a solution at the boundary of the allowed value range.
message
is a summary message of the exit status.
If diagnostics
is TRUE
, iterations
is a list
including the two-step size search path, argument grids, function values on those grids,
and estimated 3rd derivative values.
There are no references for Rd macro \insertAllCites
on this help page.
f <- function(x) x^4 step.plugin(x = 2, f) step.plugin(x = 0, f, diagnostics = TRUE) # f''' = 0, setting a large one
f <- function(x) x^4 step.plugin(x = 2, f) step.plugin(x = 0, f, diagnostics = TRUE) # f''' = 0, setting a large one
Stepleman–Winarsky automatic step selection
step.SW( FUN, x, h0 = 1e-05 * (abs(x) + (x == 0)), shrink.factor = 0.5, range = h0/c(1e+12, 1e-08), seq.tol = 1e-04, max.rel.error = .Machine$double.eps/2, maxit = 40L, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
step.SW( FUN, x, h0 = 1e-05 * (abs(x) + (x == 0)), shrink.factor = 0.5, range = h0/c(1e+12, 1e-08), seq.tol = 1e-04, max.rel.error = .Machine$double.eps/2, maxit = 40L, cores = 1, preschedule = getOption("pnd.preschedule", TRUE), cl = NULL, diagnostics = FALSE, ... )
FUN |
Function for which the optimal numerical derivative step size is needed. |
x |
Numeric scalar: the point at which the derivative is computed and the optimal step size is estimated. |
h0 |
Numeric scalar: initial step size, defaulting to a relative step of
slightly greater than .Machine$double.eps^(1/3) (or absolute step if |
shrink.factor |
A scalar less than 1 that is used to multiply the step size during the search. The authors recommend 0.25, but this may be result in earlier termination at slightly sub-optimal steps. Change to 0.5 for a more thorough search. |
range |
Numeric vector of length 2 defining the valid search range for the step size. |
seq.tol |
Numeric scalar: maximum relative difference between old and new step sizes for declaring convergence. |
max.rel.error |
Positive numeric scalar > 0 indicating the maximum relative error of function evaluation. For highly accurate functions with all accurate bits is equal to half of machine epsilon. For noisy functions (derivatives, integrals, output of optimisation routines etc.), it is higher. |
maxit |
Maximum number of algorithm iterations to avoid infinite loops.
Consider trying some smaller or larger initial step size |
cores |
Integer specifying the number of CPU cores used for parallel computation. Recommended to be set to the number of physical cores on the machine minus one. |
preschedule |
Logical: if |
cl |
An optional user-supplied |
diagnostics |
Logical: if |
... |
Passed to FUN. |
This function computes the optimal step size for central differences using the (Stepleman and Winarsky 1979) algorithm.
A list similar to the one returned by optim()
: par
– the optimal
step size found, value
– the estimated numerical first derivative (central
differences), counts
– the number of iterations (each iteration includes
four function evaluations), abs.error
– an estimate of the total
approximation error (sum of truncation and rounding errors),
exitcode
– an integer code indicating the termination status:
0
indicates optimal termination within tolerance,
2
is returned if there is no change in step size within tolerance,
3
indicates a solution at the boundary of the allowed value range,
4
signals that the maximum number of iterations was reached.
message
is a summary message of the exit status.
If diagnostics
is TRUE
, iterations
is a list
including the full step size search path,
argument grids, function values on those grids, estimated derivative values,
estimated error values, and monotonicity check results.
Stepleman RS, Winarsky ND (1979). “Adaptive numerical differentiation.” Mathematics of Computation, 33(148), 1257–1264. doi:10.1090/s0025-5718-1979-0537969-8.
f <- function(x) x^4 # The derivative at 1 is 4 step.SW(x = 1, f) step.SW(x = 1, f, h0 = 1e-9, diagnostics = TRUE) # Starting too low # Starting somewhat high leads to too many preliminary iterations step.SW(x = 1, f, h0 = 10, diagnostics = TRUE) step.SW(x = 1, f, h0 = 1000, diagnostics = TRUE) # Starting absurdly high f <- sin # The derivative at pi/4 is sqrt(2)/2 step.SW(x = pi/4, f) step.SW(x = pi/4, f, h0 = 1e-9, diagnostics = TRUE) # Starting too low step.SW(x = pi/4, f, h0 = 0.1, diagnostics = TRUE) # Starting slightly high # The following two example fail because the truncation error estimate is invalid step.SW(x = pi/4, f, h0 = 10, diagnostics = TRUE) # Warning step.SW(x = pi/4, f, h0 = 1000, diagnostics = TRUE) # Warning
f <- function(x) x^4 # The derivative at 1 is 4 step.SW(x = 1, f) step.SW(x = 1, f, h0 = 1e-9, diagnostics = TRUE) # Starting too low # Starting somewhat high leads to too many preliminary iterations step.SW(x = 1, f, h0 = 10, diagnostics = TRUE) step.SW(x = 1, f, h0 = 1000, diagnostics = TRUE) # Starting absurdly high f <- sin # The derivative at pi/4 is sqrt(2)/2 step.SW(x = pi/4, f) step.SW(x = pi/4, f, h0 = 1e-9, diagnostics = TRUE) # Starting too low step.SW(x = pi/4, f, h0 = 0.1, diagnostics = TRUE) # Starting slightly high # The following two example fail because the truncation error estimate is invalid step.SW(x = pi/4, f, h0 = 10, diagnostics = TRUE) # Warning step.SW(x = pi/4, f, h0 = 1000, diagnostics = TRUE) # Warning
Default step size at given points
stepx(x, deriv.order = 1, acc.order = 2, zero.tol = sqrt(.Machine$double.eps))
stepx(x, deriv.order = 1, acc.order = 2, zero.tol = sqrt(.Machine$double.eps))
x |
Numeric vector or scalar: the point(s) at which the derivative is estimated.
|
deriv.order |
Integer or vector of integers indicating the desired derivative order,
|
acc.order |
Integer or vector of integers specifying the desired accuracy order
for each element of |
zero.tol |
Small positive integer: if |
A numeric vector of the same length as x
with positve step sizes.
stepx(10^(-10:2)) stepx(10^(-10:2), deriv.order = 2, acc.order = 4)
stepx(10^(-10:2)) stepx(10^(-10:2), deriv.order = 2, acc.order = 4)