pmsims is an R package for estimating how much data are needed to develop reliable and generalisable prediction models. It uses a simulation-based learning curve approach to quantify how model performance improves with increasing sample size, supporting principled study planning and feasibility assessment.
The package is fully model-agnostic: users can define how data are generated, how models are fitted, and how predictive performance is measured. It currently supports regression-based prediction models with continuous, binary, and time-to-event outcomes.
Developed at King’s College London (Department of Biostatistics & Health Informatics) with input from researchers, clinicians, and patient partners. See the pmsims project site for further details.
Installation
Install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("pmsims-package/pmsims")Minimal example
library(pmsims)
set.seed(123)
binary_example <- simulate_binary(
signal_parameters = 15,
noise_parameters = 0,
predictor_type = "continuous",
binary_predictor_prevalence = NULL,
outcome_prevalence = 0.20,
large_sample_cstatistic = 0.80,
model = "glm",
metric = "calibration_slope",
minimum_acceptable_performance = 0.90,
n_reps_total = 1000,
mean_or_assurance = "assurance"
)
binary_exampleGet in touch
We welcome questions, suggestions, and collaboration enquiries.
- Email: pmsims@kcl.ac.uk
- Feedback or bugs: please open a GitHub issue
Funding
This work is supported by the National Institute for Health and Care Research (NIHR) under the Research for Patient Benefit (RfPB) Programme (NIHR206858).

The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
