Monday, August 12, 2013

Propensity Score Imputation for Binary Outcomes


Propensity score matching (PSM) / Bootstrap imputation for binary outcomes is not available either from R package "mice" or SAS PROC MI. Here is a R scripts for PSM method plus a pooling function for analyzing the results for m bootstraps.

An example using this demo dataset.
To do the imputation, you either need to copy all R code from here and here, Or copy the following scripts to download and source the functions.

Download the R functions on Mac:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r", method="curl")

download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r", method="curl")

Download the R functions on Windows:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r")

download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r")

Source the function:
source("./boot.imp.r")
source("./boot.pool.r") 
 
#Read demo dataset from online:
     require(RCurl)
     demo<-read.table(textConnection(getURL("https://raw.github.com/devilszhang/blog/master/demo.txt")), header=T)


#STEP1:
#Calculating propensity score and grouping:
    fit1<-glm(miss~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit), na.action="na.exclude") 

    fitted<-fit1$fitted.values
    demo$napred<-predict(fit1, type="response", newdata=demo)
    demo<-demo[order(demo$napred), ]
    brks<-with(demo, quantile(napred, probs=c(0, 0.2, 0.4, 0.6, 0.8, 1)))
    demo<-within(demo, qgroup<-cut(napred, breaks=brks, labels=1:5, include.lowest=T))
    demo<-demo[order(demo$qgroup, demo$miss), ]

#STEP2
#do bootstrap imputation for m replicates
    set.seed(123456)
    m<-10
    boot<-boot.imp(demo, "qgroup", "rotabi", m)
    boot<-as.data.frame(boot)
    colnames(boot)<-paste("boot", 1:m, sep="")
    ####complete dataset
    cdemo<-cbind(demo, boot) ##complete dataset 

#STEP3
#Pooling analysis
     fitall<-apply(boot, 2, function(x) summary(glm(x~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit))))
    summary(boot.pool(fitall))#Lambda is the total variance due to missing values


#PLOTTING
    plot(density(na.omit(demo$rotabi)), col=4, lwd=4, ylim=c(0,10))
    nb<-paste("boot", 1:m, sep="")
    for (i in nb) {
        lines(density(cdemo[cdemo[,"miss"]==0, i]), col=2)
    }
    




No comments:

Post a Comment