Title: | Virtual Patient Simulation by Copula Invariance Property |
---|---|
Description: | To optimize clinical trial designs and data analysis methods consistently through trial simulation, we need to simulate multivariate mixed-type virtual patient data independent of designs and analysis methods under evaluation. To make the outcome of optimization more realistic, relevant empirical patient level data should be utilized when it’s available. However, a few problems arise in simulating trials based on small empirical data, where the underlying marginal distributions and their dependence structure cannot be understood or verified thoroughly due to the limited sample size. To resolve this issue, we use the copula invariance property, which can generate the joint distribution without making a strong parametric assumption. The function copula.sim can generate virtual patient data with optional data validation methods that are based on energy distance and ball divergence measurement. The function compare.copula.sim can conduct comparison of marginal mean and covariance of simulated data. To simulate patient-level data from a hypothetical treatment arm that would perform differently from the observed data, the function new.arm.copula.sim can be used to generate new multivariate data with the same dependence structure of the original data but with a shifted mean vector. |
Authors: | Pei-Shan Yen [aut, cre] , Xuemin Gu [ctb], Jenny Jiao [ctb], Jane Zhang [ctb] |
Maintainer: | Pei-Shan Yen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1 |
Built: | 2024-11-23 04:02:20 UTC |
Source: | https://github.com/psyen0824/copulasim |
To optimize clinical trial designs and data analysis methods consistently through trial simulation, we need to simulate multivariate mixed-type virtual patient data independent of designs and analysis methods under evaluation. To make the outcome of optimization more realistic, relevant empirical patient level data should be utilized when it’s available. However, a few problems arise in simulating trials based on small empirical data, where the underlying marginal distributions and their dependence structure cannot be understood or verified thoroughly due to the limited sample size. To resolve this issue, we use the copula invariance property, which can generate the joint distribution without making a strong parametric assumption. The function copula.sim can generate virtual patient data with optional data validation methods that are based on energy distance and ball divergence measurement. The function compare.copula.sim can conduct comparison of marginal mean and covariance of simulated data. To simulate patient-level data from a hypothetical treatment arm that would perform differently from the observed data, the function new.arm.copula.sim can be used to generate new multivariate data with the same dependence structure of the original data but with a shifted mean vector.
Maintainer: Pei-Shan Yen [email protected] (ORCID)
Other contributors:
Xuemin Gu [email protected] [contributor]
Jenny Jiao [email protected] [contributor]
Jane Zhang [email protected] [contributor]
Useful links:
Performing the comparison between empirical data and multiple simulated datasets.
compare.copula.sim(object)
compare.copula.sim(object)
object |
A copula.sim object for the comparison. |
Returned the comparison of marginal parameter and covariance.
mean.comparison: comparison between empirical marginal mean and average value of simulated marginal mean. (1) simu.mean: average value of simulated mean (2) simu.sd: average value of simulated standard error (3) simu.mean.low.lim: lower limit of 95% percentile confidence interval (4) simu.mean.upp.lim: upper limit of 95% percentile confidence interval (5) simu.mean.RB: relative bias (6) simu.mean.SB: standardized bias (7) simu.mean.RMSE: root mean square error
cov.comparison: comparison between empirical covariance and average value of simulated covariance
Pei-Shan Yen, Xuemin Gu
Based on the empirical data, generating simulated datasets through the copula invariance property.
copula.sim( data.input, id.vec, arm.vec, n.patient, n.simulation, seed = NULL, validation.type = "none", validation.sig.lvl = 0.05, rmvnorm.matrix.decomp.method = "svd", verbose = TRUE )
copula.sim( data.input, id.vec, arm.vec, n.patient, n.simulation, seed = NULL, validation.type = "none", validation.sig.lvl = 0.05, rmvnorm.matrix.decomp.method = "svd", verbose = TRUE )
data.input |
The empirical patient-level data to be used to simulate new virtual patient data. |
id.vec |
The ID for individual patient in the input data. |
arm.vec |
The column to identify the arm in clinical trial. |
n.patient |
The targeted number of patients in each simulated dataset. |
n.simulation |
The number of simulated datasets. |
seed |
The random seed. Default is NULL to use the current seed. |
validation.type |
A string to specify the hypothesis test used to detect the difference between input data and the simulated data. Default is "none". Possible methods are energy distance ("energy") and ball divergence ("ball"). The R packages "energy" and "Ball" are needed. |
validation.sig.lvl |
The significant level (alpha) value for the hypothesis test. |
rmvnorm.matrix.decomp.method |
The method to do the matrix decomposition used in the function |
verbose |
A logical value to specify whether to print message for simulation process or not. |
A copula.sim object with four elements.
data.input: empirical data (wide-form)
data.input.long: empirical data (long-form)
data.transform: quantile transformation of data.input
data.simul: simulated data
Pei-Shan Yen, Xuemin Gu
Sklar, A. (1959). Functions de repartition an dimensionset leursmarges., Paris: PublInst Stat.
Nelsen, R. B. (2007). An introduction to copulas. Springer Science & Business Media.
Ross, S. M. (2013). Simulation. Academic Press.
library(copulaSim) ## Generate Empirical Data # Assume the 2-arm, 5-dimensional empirical data follows multivariate normal data. library(mvtnorm) arm1 <- rmvnorm(n = 40, mean = rep(10, 5), sigma = diag(5) + 0.5) arm2 <- rmvnorm(n = 40, mean = rep(12, 5), sigma = diag(5) + 0.5) test_data <- as.data.frame(cbind(1:80, rep(1:2, each = 40), rbind(arm1, arm2))) colnames(test_data) <- c("id","arm",paste0("time_", 1:5)) ## Generate 100 simulated datasets copula.sim(data.input = test_data[,-c(1,2)], id.vec = test_data$id, arm.vec = test_data$arm, n.patient = 100 , n.simulation = 100, seed = 2022)
library(copulaSim) ## Generate Empirical Data # Assume the 2-arm, 5-dimensional empirical data follows multivariate normal data. library(mvtnorm) arm1 <- rmvnorm(n = 40, mean = rep(10, 5), sigma = diag(5) + 0.5) arm2 <- rmvnorm(n = 40, mean = rep(12, 5), sigma = diag(5) + 0.5) test_data <- as.data.frame(cbind(1:80, rep(1:2, each = 40), rbind(arm1, arm2))) colnames(test_data) <- c("id","arm",paste0("time_", 1:5)) ## Generate 100 simulated datasets copula.sim(data.input = test_data[,-c(1,2)], id.vec = test_data$id, arm.vec = test_data$arm, n.patient = 100 , n.simulation = 100, seed = 2022)
Performing the hypothesis test to compare the difference between the empirical data and the simulated data
data.diff.test(x, y, test.method)
data.diff.test(x, y, test.method)
x |
A numeric matrix. |
y |
A numeric matrix which is compared to |
test.method |
A string to specify the hypothesis test used to detect the difference between input data and the simulated data. Default is "none". Possible methods are energy distance ("energy") and ball divergence ("ball"). The R packages "energy" and "Ball" are needed. |
A list with two elements.
p.value: the p-value of the hypothesis test.
test.result: the returned object of the hypothesis test.
Obtaining the inverse of marginal empirical cumulative distribution (ECDF)
ecdf.inv(x, p, sort.flag = TRUE)
ecdf.inv(x, p, sort.flag = TRUE)
x |
A vector of numbers which is the marginal empirical data. |
p |
A vector of numbers which is the probability of the simulated data. |
sort.flag |
A logical value to specify whether to sort the output data. |
The inverse values of p
based on ECDF of x
.
ecdf.inv(0:10, c(0.25, 0.75)) ecdf.inv(0:10, c(0.25, 0.75), FALSE)
ecdf.inv(0:10, c(0.25, 0.75)) ecdf.inv(0:10, c(0.25, 0.75), FALSE)
Converting data.simul in a copula.sim object into a list of wide-form matrices
extract.data.sim(object)
extract.data.sim(object)
object |
A copula object. |
A list of matrices for simulated data.
Simulating new multivariate datasets with shifted mean vector from existing empirical data
new.arm.copula.sim( data.input, id.vec, arm.vec, shift.vec.list, n.patient, n.simulation, seed = NULL, validation.type = "none", validation.sig.lvl = 0.05, rmvnorm.matrix.decomp.method = "svd", verbose = TRUE )
new.arm.copula.sim( data.input, id.vec, arm.vec, shift.vec.list, n.patient, n.simulation, seed = NULL, validation.type = "none", validation.sig.lvl = 0.05, rmvnorm.matrix.decomp.method = "svd", verbose = TRUE )
data.input , id.vec , arm.vec , n.patient , n.simulation , seed
|
Please refer to the function copula.sim. |
shift.vec.list |
A list of numeric vectors to specify the mean-shifted values for new arms. |
validation.type , validation.sig.lvl , rmvnorm.matrix.decomp.method , verbose
|
Please refer to the function copula.sim. |
Please refer to the function copula.sim.
Pei-Shan Yen, Xuemin Gu, Jenny Jiao, Jane Zhang
library(copulaSim) ## Generate Empirical Data # Assume that the single-arm, 3-dimensional empirical data follows multivariate normal data library(mvtnorm) arm1 <- rmvnorm(n = 80, mean = c(10,10.5,11), sigma = diag(3) + 0.5) test_data <- as.data.frame(cbind(1:80, rep(1,80), arm1)) colnames(test_data) <- c("id", "arm", paste0("time_", 1:3)) ## Generate 1 simulated datasets with one empirical arm and two new-arm. ## The mean difference between empirical arm and # (i) the 1st new arm is assumed to be 2.5, 2.55, and 2.6 at each time point # (ii) the 2nd new arm is assumed to be 4.5, 4.55, and 4.6 at each time point new.arm.copula.sim(data.input = test_data[,-c(1,2)], id.vec = test_data$id, arm.vec = test_data$arm, n.patient = 100 , n.simulation = 1, seed = 2022, shift.vec.list = list(c(2.5,2.55,2.6), c(4.5,4.55,4.6)))
library(copulaSim) ## Generate Empirical Data # Assume that the single-arm, 3-dimensional empirical data follows multivariate normal data library(mvtnorm) arm1 <- rmvnorm(n = 80, mean = c(10,10.5,11), sigma = diag(3) + 0.5) test_data <- as.data.frame(cbind(1:80, rep(1,80), arm1)) colnames(test_data) <- c("id", "arm", paste0("time_", 1:3)) ## Generate 1 simulated datasets with one empirical arm and two new-arm. ## The mean difference between empirical arm and # (i) the 1st new arm is assumed to be 2.5, 2.55, and 2.6 at each time point # (ii) the 2nd new arm is assumed to be 4.5, 4.55, and 4.6 at each time point new.arm.copula.sim(data.input = test_data[,-c(1,2)], id.vec = test_data$id, arm.vec = test_data$arm, n.patient = 100 , n.simulation = 1, seed = 2022, shift.vec.list = list(c(2.5,2.55,2.6), c(4.5,4.55,4.6)))