Monday, August 12, 2013
Propensity Score Imputation for Binary Outcomes
Propensity score matching (PSM) / Bootstrap imputation for binary outcomes is not available either from R package "mice" or SAS PROC MI. Here is a R scripts for PSM method plus a pooling function for analyzing the results for m bootstraps.
An example using this demo dataset.
To do the imputation, you either need to copy all R code from here and here, Or copy the following scripts to download and source the functions.
Download the R functions on Mac:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r", method="curl")
download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r", method="curl")
Download the R functions on Windows:
download.file("https://raw.github.com/devilszhang/blog/master/boot.imp.r", destfile="./boot.imp.r")
download.file("https://raw.github.com/devilszhang/blog/master/boot.pool.r", destfile="./boot.pool.r")
Source the function:
source("./boot.imp.r")
source("./boot.pool.r")
#Read demo dataset from online:
require(RCurl)
demo<-read.table(textConnection(getURL("https://raw.github.com/devilszhang/blog/master/demo.txt")), header=T)
#STEP1:
#Calculating propensity score and grouping:
fit1<-glm(miss~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit), na.action="na.exclude")
fitted<-fit1$fitted.values
demo$napred<-predict(fit1, type="response", newdata=demo)
demo<-demo[order(demo$napred), ]
brks<-with(demo, quantile(napred, probs=c(0, 0.2, 0.4, 0.6, 0.8, 1)))
demo<-within(demo, qgroup<-cut(napred, breaks=brks, labels=1:5, include.lowest=T))
demo<-demo[order(demo$qgroup, demo$miss), ]
#STEP2
#do bootstrap imputation for m replicates
set.seed(123456)
m<-10
boot<-boot.imp(demo, "qgroup", "rotabi", m)
boot<-as.data.frame(boot)
colnames(boot)<-paste("boot", 1:m, sep="")
####complete dataset
cdemo<-cbind(demo, boot) ##complete dataset
#STEP3
#Pooling analysis
fitall<-apply(boot, 2, function(x) summary(glm(x~daysbirth+sex+lepidays+lepisevrt+dvomtdays, data=demo, family=binomial(link=logit))))
summary(boot.pool(fitall))#Lambda is the total variance due to missing values
#PLOTTING
plot(density(na.omit(demo$rotabi)), col=4, lwd=4, ylim=c(0,10))
nb<-paste("boot", 1:m, sep="")
for (i in nb) {
lines(density(cdemo[cdemo[,"miss"]==0, i]), col=2)
}
Labels:
Imputation
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment