Collects the architecture and training hyper-parameters used by
rem(method = "nn"). Training maximizes the same conditional-logistic
partial likelihood as method = "clogit" (softmax over each risk set), so
this backend is a drop-in flexible counterpart of the linear conditional
logit. Two predictor architectures are available:
"mlp"a multilayer perceptron scoring the full covariate vector jointly — can represent interactions between statistics.
"additive_spline"an additive predictor
sum_k f_k(x_k)with eachf_ka B-spline expansion fitted by (mini-batch) stochastic gradient — the STREAM construction of Filippi-Mazzola & Wit (2024, JRSS-C 73(4), doi:10.1093/jrsssc/qlae023 ). Interpretable per-feature curves; withbatch_stratait scales to event logs far beyond what an in-memory smooth fit can hold.
Arguments
Integer vector of hidden-layer sizes for
"mlp", e.g.c(16, 8). Useinteger(0)for no hidden layer (recovers a linear conditional logit fit by gradient descent). Ignored for"additive_spline".- activation
Hidden-layer activation for
"mlp":"relu"or"tanh".- architecture
Predictor architecture:
"mlp"(default) or"additive_spline"; see Description.- spline_df
Degrees of freedom (basis size) per covariate for
"additive_spline"; passed tosplines::bs().- batch_strata
Optional mini-batch size, in strata, for stochastic gradient training.
NULL(default) trains full-batch; a value such as512takes one Adam step per sampled chunk of strata each epoch.- epochs
Maximum number of training epochs (full passes over the training strata).
- lr
Adam learning rate.
- l2
L2 penalty (weight decay). The pure-R engine penalises the weights only; the torch engine applies it via Adam's
weight_decay.- validation
Fraction of strata held out for validation / early stopping. Set to
0to train on everything (no early stopping).- patience
Early-stopping patience: training stops after this many epochs without improvement of the validation loss; the best parameters are restored.
- standardize
Z-score the features before training (recommended; the scaling is stored and re-applied by
predict()).- engine
Training engine:
"r"(default) uses the built-in pure-R implementation with hand-derived gradients;"torch"trains the same model and loss with the torch package (libtorch / autograd), which is markedly faster and, withbatch_strata, scales to large event logs (optionally on GPU). The two engines fit identical model classes and return interchangeable objects."torch"requires the suggested torch package (runtorch::install_torch()once) and equal-sized strata (the usual case-control layout with a fixed number of controls).- seed
Optional integer seed for reproducible initialization and validation split.
- verbose
Print the loss every 50 epochs.
