2022년 11월 17일 목요일

Excel and Statistics(24), Regression

Excel and Statistics, everybody should know.  : Contents  

Regression

 

Open ‘example2.xlsx’ and specify the dependent variable in y and the independent variables in x. At this time, y can have only one column.

 

It shows basic and important results. It shows ‘R Square’ and ‘Adjusted R Square’ and shows the ANOVA table. What researchers are often interested in is the ‘coefficient’ or ‘p-value’.

When time increases by 1, y increases 0. as much as 000445351. This is what Coefficients mean. The larger the Coefficients, the more influential it can be. However, it may vary depending on the units of the Coefficients, for example, the weight may also vary depending on whether it is in kg or pounds. Be careful with your interpretation, even for your height, whether it's in meters or centimeters.

Care must also be taken in the interpretation of nominal variables such as sex. Compared to a person with a sex of 1, a person with a sex of 2 increases by -3.23. That is a decrease of 3.23. If you misunderstand what a person with sex 1 and a person with sex 2 are, you can interpret them as opposite results. Therefore, it is much easier to interpret if you use the variable name ‘male’ rather than ‘sex’ and code it with 0 and 1.

 

Let’s select additional options to get more information.

 

It shows the residuals and linearity for each variable in graphs, and shows the overall forecasts and residuals.

 

If you look at one of them, ‘Residual Plot of the Meal.cal variable’, you can see that there is no special pattern overall. It allows you to graphically check the residuals linearity, equal variance and normality assumptions.

 

‘Line Fit Plot’ also lets you know if the variable and y values tend to be linear, and gives you an idea of how much the slope is. It also gives you a rough idea of the predictions and their relationships. These two pictures contain a lot of information, so it’s a good idea to take a look.

Plus Alpha

 

Visit https://tinyurl.com/Generalized-LM and upload the ‘example2.csvŒ‘ file. There are many different statistical methods to choose from, but we’ll only look at the basic ‘linear’. It is set to analyze the ‘outcome’ variable as dependent variableŽ and all the rest as the independent variable.

 

It shows the same value as you saw in Excel, with an asterisk next to the p-value.

 

 

 

 

Returning to Excel, the ‘Residual output’ section (left) shows the difference between the predicted value of each point and the corresponding actual value (residual) and the standardized residual.

 

Visit https://tinyurl.com/residual-plots-linear-model and upload ‘example3.csv’. This is the same data as before, just renaming the dependent variable to ‘y’. In this case, the y is automatically recognized as the dependent variable.

 

The Plot tab allows you to observe the overall residuals and their distribution. You can also check the QQ plot to see if it is normally distributed. Currently, we can’t plot the residuals of the variables one by one, but next time we will revise them so that we can see them one by one.

 

Select ‘Forecast y’ and ‘Residuals’ in Excel, and then click on the first option of ‘Insert>Chart>Scatter ‘.

I showed a picture of the residuals (left) and changed the axis (right) to make it more readable. To decorate more, you can take advantage of Excel’s various options.

Plotting the residuals is a very important step in checking the basic assumptions of regression analysis, but it is easy to omit it. It’s simple and necessary to check it out.

 


 

 

=================================================

  • R data visualization book 2
https://tinyurl.com/R-plot-II-2  simple variables
https://tinyurl.com/R-plot-II-3-4   many variables / map
https://tinyurl.com/R-plot-II-5-6   time related / statistics related
https://tinyurl.com/R-plot-II-7-8   others / reactive chart 
 


 

 

 


댓글 없음:

댓글 쓰기