pmsims 0.5.0 is the initial public release of the
package. It provides a simulation-based framework for minimum sample
size estimation in prediction model development, with support for
continuous, binary, and time-to-event outcomes. The package and its
validation work are described in two accompanying preprints: the
overview paper by Shamsutdinova et al. (2026)
and the validation paper by Olaniran et al. (2026).
Why pmsims?
When developing a new prediction model, the key question is often not
just how many observations are available, but whether that sample size
is large enough to produce a model with acceptable predictive
performance. pmsims addresses that problem by repeatedly
simulating data, fitting models, and evaluating performance across a
range of training sample sizes.
Rather than relying only on closed-form approximations, the package estimates a learning curve and identifies the smallest sample size that achieves the chosen performance target.
Core workflows
The package provides three main wrapper functions:
-
simulate_binary()for binary outcomes -
simulate_continuous()for continuous outcomes -
simulate_survival()for time-to-event outcomes
These wrappers are designed to make the most common use cases
straightforward. Users specify the outcome setting, the expected signal
strength, the modelling approach, and the target level of predictive
performance, and pmsims estimates the minimum sample size
needed under repeated sampling.
The wrappers support both mean-based and assurance-based criteria. The recommended design objective is the assurance criterion: the smallest sample size such that a high proportion of repeated studies (for example, 80%) meet the target performance, rather than only on average.
library(pmsims)
set.seed(123)
binary_example <- simulate_binary(
signal_parameters = 15,
noise_parameters = 0,
predictor_type = "continuous",
outcome_prevalence = 0.20,
maximum_achievable_cstatistic = 0.80,
model = "glm",
metric = "calibration_slope",
target_performance = 0.90,
n_reps_total = 1000,
mean_or_assurance = "assurance"
)Custom simulation studies
For more specialised study designs, pmsims also provides
simulate_custom(). This lower-level interface allows users
to define their own data-generating mechanism, model-fitting function,
and performance metric, while still using the package’s simulation-based
sample size search framework.
In practice, the wrappers are the right place to start when the
default outcome types and modelling workflows match the intended study.
simulate_custom() is most useful when a project requires a
bespoke simulation design or a non-standard evaluation metric.
Experimental machine-learning support
The wrapper workflows also include experimental machine-learning options via regularised regression, random forest, and XGBoost.
These methods have not yet undergone the package’s main validation
study and should be treated as experimental in 0.5.0.
Getting started
The package website includes a getting-started vignette that introduces the main wrapper functions and explains the key simulation inputs.
Install version 0.5.0 of pmsims from GitHub
with:
# install.packages("remotes")
remotes::install_github("pmsims-package/pmsims", ref = "v0.5.0")Version 0.5.0 is available from GitHub and is not yet a
CRAN release.
