Skip to contents

pmsims 0.5.0 is the initial public release of the package. It provides a simulation-based framework for minimum sample size estimation in prediction model development, with support for continuous, binary, and time-to-event outcomes. The package and its validation work are described in two accompanying preprints: the overview paper by Shamsutdinova et al. (2026) and the validation paper by Olaniran et al. (2026).

Why pmsims?

When developing a new prediction model, the key question is often not just how many observations are available, but whether that sample size is large enough to produce a model with acceptable predictive performance. pmsims addresses that problem by repeatedly simulating data, fitting models, and evaluating performance across a range of training sample sizes.

Rather than relying only on closed-form approximations, the package estimates a learning curve and identifies the smallest sample size that achieves the chosen performance target.

Core workflows

The package provides three main wrapper functions:

These wrappers are designed to make the most common use cases straightforward. Users specify the outcome setting, the expected signal strength, the modelling approach, and the target level of predictive performance, and pmsims estimates the minimum sample size needed under repeated sampling.

The wrappers support both mean-based and assurance-based criteria. The recommended design objective is the assurance criterion: the smallest sample size such that a high proportion of repeated studies (for example, 80%) meet the target performance, rather than only on average.

library(pmsims)

set.seed(123)

binary_example <- simulate_binary(
  signal_parameters = 15,
  noise_parameters = 0,
  predictor_type = "continuous",
  outcome_prevalence = 0.20,
  maximum_achievable_cstatistic = 0.80,
  model = "glm",
  metric = "calibration_slope",
  target_performance = 0.90,
  n_reps_total = 1000,
  mean_or_assurance = "assurance"
)

Custom simulation studies

For more specialised study designs, pmsims also provides simulate_custom(). This lower-level interface allows users to define their own data-generating mechanism, model-fitting function, and performance metric, while still using the package’s simulation-based sample size search framework.

In practice, the wrappers are the right place to start when the default outcome types and modelling workflows match the intended study. simulate_custom() is most useful when a project requires a bespoke simulation design or a non-standard evaluation metric.

Experimental machine-learning support

The wrapper workflows also include experimental machine-learning options via regularised regression, random forest, and XGBoost.

These methods have not yet undergone the package’s main validation study and should be treated as experimental in 0.5.0.

Getting started

The package website includes a getting-started vignette that introduces the main wrapper functions and explains the key simulation inputs.

Install version 0.5.0 of pmsims from GitHub with:

# install.packages("remotes")
remotes::install_github("pmsims-package/pmsims", ref = "v0.5.0")

Version 0.5.0 is available from GitHub and is not yet a CRAN release.