optimMaltipooCollapsed.Rd
See details for model. Should likely be followed by function
uncollapsePibble
. Notation: N
is number of samples,
D
is number of multinomial categories, and Q
is number
of covariates.
optimMaltipooCollapsed(Y, upsilon, Theta, X, KInv, U, init, ellinit, n_samples = 2000L, calcGradHess = TRUE, b1 = 0.9, b2 = 0.99, step_size = 0.003, epsilon = 1e-06, eps_f = 1e-10, eps_g = 1e-04, max_iter = 10000L, verbose = FALSE, verbose_rate = 10L, decomp_method = "cholesky", eigvalthresh = 0, jitter = 0)
Y | D x N matrix of counts |
---|---|
upsilon | (must be > D) |
Theta | D-1 x Q matrix the prior mean for regression coefficients |
X | Q x N matrix of covariates |
KInv | D-1 x D-1 symmetric positive-definite matrix |
U | a PQxQ matrix of stacked variance components |
init | D-1 x N matrix of initial guess for eta used for optimization |
ellinit | P vector of initial guess for ell used for optimization |
n_samples | number of samples for Laplace Approximation (=0 very fast as no inversion or decomposition of Hessian is required) |
calcGradHess | if n_samples=0 should Gradient and Hessian still be calculated using closed form solutions? |
b1 | (ADAM) 1st moment decay parameter (recommend 0.9) "aka momentum" |
b2 | (ADAM) 2nd moment decay parameter (recommend 0.99 or 0.999) |
step_size | (ADAM) step size for descent (recommend 0.001-0.003) |
epsilon | (ADAM) parameter to avoid divide by zero |
eps_f | (ADAM) normalized function improvement stopping criteria |
eps_g | (ADAM) normalized gradient magnitude stopping criteria |
max_iter | (ADAM) maximum number of iterations before stopping |
verbose | (ADAM) if true will print stats for stopping criteria and iteration number |
verbose_rate | (ADAM) rate to print verbose stats to screen |
decomp_method | decomposition of hessian for Laplace approximation 'eigen' (more stable-slightly, slower) or 'cholesky' (less stable, faster, default) |
eigvalthresh | threshold for negative eigenvalues in decomposition of negative inverse hessian (should be <=0) |
jitter | (default: 0) if >0 then adds that factor to diagonal of Hessian before decomposition (to improve matrix conditioning) |
List containing (all with respect to found optima)
LogLik - Log Likelihood of collapsed model (up to proportionality constant)
Gradient - (if calcGradHess
=true)
Hessian - (if calcGradHess
=true) of the POSITIVE log posterior
Pars - Parameter value of eta
Samples - (D-1) x N x n_samples array containing posterior samples of eta based on Laplace approximation (if n_samples>0)
VCScale - value of e^ell_i at optima
logInvNegHessDet - the log determinant of the covariacne of the Laplace approximation, useful for calculating marginal likelihood
Notation: Let Z_j denote the J-th row of a matrix Z. Model: $$Y_j \sim Multinomial(Pi_j)$$ $$Pi_j = Phi^{-1}(Eta_j)$$ $$Eta \sim T_{D-1, N}(upsilon, Theta*X, K, A)$$
Where A = (I_N + e^ell_1XU_1X' + ... + e^ell_PXU_PX' ), K is a D-1xD-1 covariance and Phi is ALRInv_D transform.
Gradient and Hessian calculations are fast as they are computed using closed form solutions. That said, the Hessian matrix can be quite large [N*(D-1) x N*(D-1)] and storage may be an issue.
Note: Warnings about large negative eigenvalues can either signal
that the optimizer did not reach an optima or (more commonly in my experience)
that the prior / degrees of freedom for the covariance (given by parameters
upsilon
and KInv
) were too specific and at odds with the observed data.
If you get this warning try the following.
Try restarting the optimization using a different initial guess for eta
Try decreasing (or even increasing)step_size
(by increments of 0.001 or 0.002)
and increasing max_iter
parameters in optimizer. Also can try
increasing b1
to 0.99 and decreasing eps_f
by a few orders
of magnitude
Try relaxing prior assumptions regarding covariance matrix. (e.g., may want
to consider decreasing parameter upsilon
closer to a minimum value of
D)
Try adding small amount of jitter (e.g., set jitter=1e-5
) to address
potential floating point errors.
S. Ruder (2016) An overview of gradient descent optimization algorithms. arXiv 1609.04747