Statistics for everyone: ch.13 Logistic regression

easier R than SPSS with Rcmdr : Contents

ch.13 Logistic regression

Open the ‘dataKm’ data.

The ‘event’ variables are 0 and 1, where 1 means that an event has occurred. For logistic regression, the data must be prepared in this way.

Select ‘logistic regression’.

It can be a little daunting here. You can type directly on the red square or double-click on the variable above, as shown in the figure, write a formula to connect using +. Let’s put all the rest without the time variable. You can copy it to Notepad for later use. Select the option like the green rectangle and proceed.

Remember, the top model is named GLM.1. It can also be used later.

The two pictures appear to overlap. Interpret ‘diagnostic plots’ the same as before. The ROC plot shows the probability (calculated) of an event occurring and its relationship to what actually happened.

If you don’t see the ROC plot at this point, you need to install the required packages. You can install both packages ‘aod’ and ‘pROC’ by running the command install.packages (‘aod’, ‘pROC’) as you learned earlier in Chapter 1, or more easily.

This means that you can write this command in the R console and hit the enter keyŒ, or you can hover over the phrase you wrote in script and click the 3rd icon to run itŽ. In this way of writing sentences, you have to be careful because if you miss any of the spellings or misspin the case, it won’t work. It’s usually convenient to write in script and change only the quote part. ROC will be taught in earnest in Chapter 17.

After installing the package, Close all programs (R and R commander) and run R and R commander again. Run the preceding command again to make sure that the ROC plot is created correctly. (Sometimes you may need to turn off R commander and RGui both and then run them again.)

After the command, a new column (PropensityScore) is created to the right of the original data, which is the (calculated) probability that the event will occur.

The way it is interpreted has some similarities to l inear regression. The value of p is still important. But Estimate is a little hard for us to understand.

Look at the odds section at the bottom. Exp(Estimate) = odds ratio.

When age is increased by 1 , the event is increased by 0.958 times. Since this value is less than 1, it actually decreases a bit.

When the height increases by 1, it increases by a factor of 1.03. If the height is increased by 10, it will increase by 1.03^10 = 1.34. Not 1.03 * 10 = 10.3 times.

sex [T.M] = 1.14 means that there are 1.14 times more events than F when F is used as a baseline, rather than M.

The implications of trt[T.Group-B] = 1.360 and trt[T.Group-C] = 0.160 means that when Group A is based, B is 1.36 times more likely and C is 0.16 times more likely. (0.16 times more means actually less). So Group C is the least, and Group B is the thing that happens the most.

VIF is intended to measure problems of multicollinearity. This is true not only for logistic regression, but also for linear regression, which, as the results show, only calculates between independent variables. If the number is greater than or equal to 10, it means that the collinearity is very high, and the variable is almost explained by other variables.

This curve is ROC, which means that the lower area of this ROC curve is 0.773. This is the maximum value of 1, the minimum value of 0.5. The larger the better.

Let’s check what’s in the green square. This is what we did earlier in linear regression.

As you saw earlier, it reduce the variables one by one.

In the end, ‘age’ and ‘trt’ remained.

Looking at the first line, it changed to criterion = ‘BIC’.

As a result, the same variables remained.

Let’s learn how to tabulate these results.

I choose this because we currently have a logistic regression, but in some cases you can choose from a variety of things.

Name it appropriately and save it as a csv file.

You can open and view the csv files in Excel, copy and paste them to table them in Word or PowerPoint. The model depends on which model you choose.

Next Part is Survival Analysis, to be continued

easier R than SPSS with Rcmdr : Contents

=================================================