rem() is the unified front-end for fitting relational event models from
already preprocessed case-control data (e.g. produced by eventnet),
where the endogenous/exogenous covariates have already been computed. It is
intended to supersede compare_models(), compare_models_smooth() and
compare_models_global(), which couple feature computation and fitting.
Usage
rem(
formula,
data,
method = c("gam", "clogit", "nn"),
case = NULL,
stratum = NULL,
time = NULL,
k = NULL,
gam_method = NULL,
nn = nn_control(),
...
)Arguments
- formula
A formula; see Formula syntax.
- data
A data.frame of preprocessed case-control data (wide for the
gammethod; long with a case indicator and stratum forclogit). Formethod = "gam", long case-control input (aevent/IS_OBSERVEDindicator with control rows) is detected and widened automatically viawiden_case_control(), with a message.- method
Estimation backend; see Description.
- case
Optional name of the 0/1 event-indicator column for the
clogitandnnbackends. IfNULL(default), the indicator is taken from the formula's left-hand side (e.g.event ~ x). Ignored by thegammethod.- stratum
Name of the column grouping each case with its controls (required by
clogit).- time
Name of the time column, required for
tv/tvnlterms.- k
Optional integer basis dimension passed to
s()/te().- gam_method
Smoothness-selection method for the
gambackend, passed tomgcv::gam(). Defaults toNULL, which uses mgcv's own default ("GCV.Cp") and reproduces the Intro-to-REM tutorial parameterization. Set to"REML"for the REML fit used in some papers.- nn
An
nn_control()object with the architecture and training hyper-parameters formethod = "nn". Ignored by the other backends.- ...
Reserved for future use.
Value
An object of class "rem": a list with the fitted model ($fit),
the method, the original formula, the parsed terms, and the number of
observations n. Has summary(), coef(), plot() and logLik()
methods.
Details
Two estimation backends are provided:
"gam"Degenerate logistic regression on a case-1-control design (Boschi, Lerner & Wit 2025): the response is a constant 1 and the linear predictor is built from event-minus-control differences. Supports smooth time-varying (
tv), non-linear (nl) and time-varying non-linear (tvnl) effects viamgcv::gam()."clogit"Conditional logistic regression on a case-k-control design via
survival::clogit()(linear terms only). The case/control strata are taken fromstratum, or derived ascumsum(case == 1)whenstratumisNULL(assuming each case is immediately followed by its controls, the eventnet blocked layout)."nn"Flexible conditional-logistic models on the same case-k-control design as
clogit, trained by (mini-batch) gradient descent on the exact risk-set softmax partial likelihood. Two architectures viann_control(): a multilayer perceptron scoring the full covariate vector jointly (interaction-capable), or anadditive_splinepredictor — per-covariate B-spline expansions fitted by stochastic gradient, the STREAM construction of Filippi-Mazzola & Wit (2024, JRSS-C). No coefficient table;summary()reports in-sample (and, with a validation split, held-out) concordance andplot(type = "pdp")shows per-feature curves. Pure-R implementation, no extra dependencies.
Formula syntax
The right-hand side lists covariates. A bare name is a linear effect; wrap
a name to request a smooth effect (gam method only):
tv(x)— time-varying linear effect:s(time, by = d_x).nl(x)— non-linear effect:s(cbind(x_ev, x_nv), by = c(1, -1)).tvnl(x)— time-varying non-linear effect (tensor product).re(x)— random effect of a grouping factorx(e.g. the sender), built from the matchedx_ev/x_nvlevels ass(cbind(x_ev, x_nv), by = cbind(1, -1), bs = "re"), contributingf(event_level) - f(control_level)(following the REM tutorial's species-invasiveness term). Falls back to a single columnxwhenx_ev/x_nvare absent. Identified only when the event and control differ onx.
For the gam method the left-hand side is ignored (the response is the
constant case indicator); for clogit the left-hand side names the 0/1 event
indicator column (e.g. event ~ x), unless case is given explicitly.
Column resolution
For a covariate x, the event/control difference is taken from column x,
else d_x, else x_ev - x_nv. Non-linear terms use transform_x_ev /
transform_x_nv when present (the eventnet spline-transformed covariate),
otherwise x_ev / x_nv. tvnl uses transformed_time when present.
Undirected logs (senders only, no receiver/TARGET column) are supported.
See also
compare_models_smooth() (superseded), simulate_relational_events()
(whose wide = TRUE output is a valid input here),
simulate_directed_hyperevents_tvnl().
Examples
set.seed(1)
w <- simulate_relational_events(
n_events = 300, senders = paste0("a", 1:12), receivers = paste0("a", 1:12),
n_controls = 1, endogenous_stats = "reciprocity_count",
endogenous_effects = c(reciprocity_count = 0.6), wide = TRUE)
fit <- rem(~ reciprocity_count, data = w, method = "gam")
coef(fit)
#> reciprocity_count
#> 0.9043626
