Thursday, August 15, 2013

R function for Multiple Imputation of Longitudinal Binary Data Based on Propensity Score

Description 

R function time1(x, m) and timex(x, inacol, nacol, m) return the imputed data based on propensity score and beta distribution (see details in reference paper). This multiple imputation is to impute the missing values according to time series and the later imputed values are dependent on the prior imputed data, which is different from R package "MICE" and SAS PROCE MI where the imputation is performed based on regression models.

Usage

time1(x, m)
timex(x, inacol, nacol, m)
 

Arguments

x              data frame of longtitudinal binary data to impute
m             number of replicates
inacol      a list of the column numbers/names before timex without NAs
nacol       the number of time before time x, for example total times is 5, to impute the time 3, nacol=2.

Examples

Copy the full R code from HERE or run the following scripts in R Gui:

Download the function code on Mac:
download.file("https://raw.github.com/devilszhang/blog/master/timex.r", destfile="./timex.r", method="curl")


Download the function code on Windows:
download.file("https://raw.github.com/devilszhang/blog/master/timex.r", destfile="./timex.r")

Source the code:

source("./timex.r")
Access the demo data from HERE or using:
require(RCurl)
x<-read.table(textConnection(getURL("https://raw.github.com/devilszhang/blog/master/longdata.txt")), header=T)
#Impute day0
x$t1imp<-time1(polio52, 5)
#Impute day4
inacol<-c("t1imp")   
nacol<-c("day4")   
x$t2imp<-timex(x, inacol, nacol, 1)   
#Impute day11
 inacol<-c("t1imp", "t2imp")
nacol<-c("day11")
x$t3imp<-timex(x, inacol, nacol, 2)   
#Impute day18
 inacol<-c("t1imp", "t2imp", "t3imp")
nacol<-c("day18")
x$t4imp<-timex(x, inacol, nacol, 3)   
#Impute day25
 inacol<-c("t1imp", "t2imp", "t3imp", "t4imp")
nacol<-c("day25")
x$t5imp<-timex(x, inacol, nacol, 4)   
#complete data
complx<-x[, c("t1imp", "t2imp", "t3imp", "t4imp", "t5imp")]
complx    


References

Li X.M. et al. Statistics in Medicine, 2006, 25: 2107-2124.

Wednesday, August 14, 2013

R Function for Simulating the Sample Size, Type I error and Power in Adaptive Group Sequential Design

Description


The R function SAP() returns the planning sample size, simulated sample size, simulated standard error, simulated arm1/arm2 event rate and standard error, simulated type I error and statistical power  for four methods (see references) with binary outcomes.

Usage


SAP(r, type1, type2, interim, tpi1, delta,p)
tra.size(type1, type2, pi1, pi2,delta) #return Chow's sample size
Gould.size(type1, type2, opi, delta) #return Gould Sample size
Shih.size(n,r, type1,type2,tpi1,tpi2,p,interim
##return Shih's sample size, n is the sample size from either Gould or Chow.


Arguments

r                   number of replicates in simulation, such as 10000
type1           designed type1 error, such as 0.05
type2           designed type2 error, such as 0.2 (power=0.8)
interim        the proportion of total information by which interim analysis is planned, such as 0.3
tpi1             designed arm 1 event rate, such as 0.3
delta            minimum clinical effect size, such as 0.2
p                 the stratification factor for Shih's method, say 0.3 suggested by the paper.

Examples


To run the functions, you either to copy all the R codes from here or copy the following scripts to the R Gui window:

Download the function code on Mac:

download.file("https://raw.github.com/devilszhang/blog/master/SAP.r", destfile="./SAP.r", method="curl")


Download the function code on Windows:


download.file("https://raw.github.com/devilszhang/blog/master/SAP.r", destfile="./SAP.r")

Source the code:

source("./SAP.r")

Suppose you're planning a trial with the desired values in Arguments:

SAP(r=10000, type1=0.05, type2=0.2, interim=0.3, tpi1=0.3, delta=0.2, p=0.3)



References


  1. Chow S.C., et al. Sample Size Calculations in Clinical Research, Marcel Dekker Inc.: New York, 2003.
  2. Gould A. L., et al. Statistics in Medicine, 1992, 11: 55-66.
  3. Shih W. J., et al. Statistics in Medicine, 1997, 16: 1913-1923. (Shih's method based on Gould)
  4. Shih's method based on Chow.

Monday, August 12, 2013

Propensity Score Imputation for Binary Outcomes


Propensity score matching (PSM) / Bootstrap imputation for binary outcomes is not available either from R package "mice" or SAS PROC MI. Here is a R scripts for PSM method plus a pooling function for analyzing the results for m bootstraps.

An example using this demo dataset.
To do the imputation, you either need to copy all R code from here and here, Or copy the following scripts to download and source the functions.

Download the R functions on Mac:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r", method="curl")

download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r", method="curl")

Download the R functions on Windows:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r")

download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r")

Source the function:
source("./boot.imp.r")
source("./boot.pool.r") 
 
#Read demo dataset from online:
     require(RCurl)
     demo<-read.table(textConnection(getURL("https://raw.github.com/devilszhang/blog/master/demo.txt")), header=T)


#STEP1:
#Calculating propensity score and grouping:
    fit1<-glm(miss~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit), na.action="na.exclude") 

    fitted<-fit1$fitted.values
    demo$napred<-predict(fit1, type="response", newdata=demo)
    demo<-demo[order(demo$napred), ]
    brks<-with(demo, quantile(napred, probs=c(0, 0.2, 0.4, 0.6, 0.8, 1)))
    demo<-within(demo, qgroup<-cut(napred, breaks=brks, labels=1:5, include.lowest=T))
    demo<-demo[order(demo$qgroup, demo$miss), ]

#STEP2
#do bootstrap imputation for m replicates
    set.seed(123456)
    m<-10
    boot<-boot.imp(demo, "qgroup", "rotabi", m)
    boot<-as.data.frame(boot)
    colnames(boot)<-paste("boot", 1:m, sep="")
    ####complete dataset
    cdemo<-cbind(demo, boot) ##complete dataset 

#STEP3
#Pooling analysis
     fitall<-apply(boot, 2, function(x) summary(glm(x~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit))))
    summary(boot.pool(fitall))#Lambda is the total variance due to missing values


#PLOTTING
    plot(density(na.omit(demo$rotabi)), col=4, lwd=4, ylim=c(0,10))
    nb<-paste("boot", 1:m, sep="")
    for (i in nb) {
        lines(density(cdemo[cdemo[,"miss"]==0, i]), col=2)
    }