Statistics for everyone: ch.23 Propensity Score Matching and Analysis

easier R than SPSS with Rcmdr : Contents

ch.23 Propensity Score Matching and Analysis

If you don’t have a package named ‘Matching’ installed, install ‘Matching’.

As you learned earlier, if you install it, write library(Matching), and then click ‘Submit’, the data in this package will be available.

Let’s call the data called lalonde in a package called Matching.

Let’s say it’s data that analyzes which educated and uneducated people’s incomes vary. The treated person is coded with 1 in the ‘treat’ variable and the non-educated person is coded as 0.

First, let’s draw a histogram to see the distribution of income, which is the result variable.

Select the result variable, re78, and the independent variable, treat, respectively.

If you look at the previous histogram, it shows an asymmetrical distribution with a lot of low-income people as a whole. You can see that the uneducated county (red) has a lot of low income, and the educated county (green) has a slightly higher income.

Since we cannot assume that we have a normal distribution, we will perform a nonparametric test.

Select variables.

A Wilcoxon rank sum test with continuity correction was performed, and the results tell us that there is a statistical difference.

If you look at the median and the 4-quartile range, you can guess how much of a difference there is.

These summaries are well shown in boxplot.

But we can have questions. Are the two groups randomly allocated? Isn’t this the result of the educated group originally having literacy, motivation, and better physical fitness? Can this difference be seen as the effect of education?

The best way is to conduct a randomized controlled study, but if the situation is so bad that you only need to use the data you are given, you can match and select people of similar personalities from both groups to answer any of these questions.

Let’s select a menu for matching.

Choose the variables you want to match carefully. We’ve selected 4 variables here, and we’ll match them 1:1.

The newly created data is named ‘lalonde_MP’. You can see that the some of row names are missing from the splendor of the name.

Before the matching, it was 445:185 numbers, but after the matching, there were 133, so both groups had the same number of samples.

When I drew the histogram again, it was a similar aspect, but slightly different.

It now shows results that are statistically not significant.

This data is matched data for some variables.

Load the original data again.

This time, let’s also add the ‘married’ variable to match.

Now a little less than 120 pairs of data have been extracted.

This is the result of looking at the distribution and then performing a nonparametric test. As it is now, you can see that every time you do a match, the result changes a little. Therefore, it is advisable to plan in advance which matching variables to match before conducting the study.

On the other hand, when comparing the result variables, which are nominal variables of the two groups created in this way, we do not use the MacNemar test directly below, because the individual data are not matched, but between the groups.

However, if we assume that we are indeed matched 1:1, we would have to run paired t-test or Wilcoxon signed-rank test for contiguous variables, and MacNemar test for nominal variables.

At the very end of the data, on the other hand, there is a new variable called pairmatch, which will indicate that the number 1 here and the number 1 at the bottom (which appears when you scroll down) are matched to each other. This pairmatch variable may also be used in other analyses in the future.

If you take advantage of the logistic regression you learned earlier, you can do propensity score matiching.

With the original data again, you put the treat variable as the result variable rather than the event variable that corresponds to the result, and in the explanatory variable you put all the rest of the variables minus the real result variable (in this case, re78) at this time. If you make good use of– (minus), you can use them conveniently. Be sure to check the Make propensity score variable.

The subsequent data generates a propensity score in the last column, which is to find the probability that the treat will be 1. (In this case, treats must be set to 0 and 1 in advance to perform logistic regression. )

Let’s do matching in this state.

The variable to match is the propensity score. Then name and remember the new dataset.

It shows the numbers before the match and the number remaining after the match, for a total of 164 pairs of data selected.

If you look at this data, there is a number on the right that shows pairs, so that the same number becomes a pair. If you look at the row numbers on the left, you can see that the data is selected because it is not continuous and some parts are missing.

Let’s call the data back before the match and select Caliper matching as ‘No’, this time. Give the new data a different name. The ratio of matching is usually 1, but can be changed in special cases.

We have now selected a total of 185 pairs.

If you run it again and adjust the caliper to 0.1, the smaller number is selected: the narrower the caliper (thickness), the less are selected, and the wider the choice, the more are selected.

The data from this propensity score matiching are paired statistics and can be used to perform the pairede t-test or the McNemar test that we learned earlier, depending on the nature of the outcome variable. Alternatively, you can perform the special statistics shown in the menu below.

In each of these 3 analyses, it will be used to display paired pairs.

This figure shows that it can be used when trying to do the Mantel-Haenzel test.

This figure shows that you can use it when you want to do logistic regression, that is, conditional logistic regression of paired materials.

This figure shows that you can use it when you want to do a Cox regression of paired materials, that is, a Stratified Cox regression. Stratified Cox regression or conditional logistic regression can be used in studies for originaly paired materials. Alternatively, you can do propesity score matching and use it in paired data.