2022년 11월 12일 토요일

ch.11 Correlations

 easier R than SPSS with Rcmdr : Contents

 

ch.11 Correlations


Now let’s call the ‘iris’ data.

 

Data consisting of 4 columns of numbers and 1 string.

Select ‘Correlation test’.

 

Select 2 continuous variables. The t-test and ANOVA that we learned earlier deal ‘nominal variables + continuous variables,’ and the chi-square test deal ‘nominal variables + nominal variables’. And the correlation analysis that we are now learning deal ‘continuous variables + continuous variables’. Is something easy to understand?

Choose one of the 3 methods of the red square. The most commonly used ‘Pearson’s product-moment’ is simply called ‘Pearson’s Correlation’. Pearson’s correlation analysis, which is the representative of the correlation analysis, assumes that the two continuous variables have a normal distribution.

 

The value of p is important, but so is the correlation coefficient. Looking at the above results, the correlation coefficient is very large, at 0.96. The 95% confidence interval of the correlation coefficient is also calculated and displayed.

 

 

The other two test methods, Spearman’s rank correlation and Kendall’s rank correlation, can also be used when not having a normal distribution; as the name suggests, rank is used. It calculates the values rho and tau, respectively.

 

Let’s choose a similarly named ‘Correlation matrix’.

 


The ‘Correlation matrix’ allows you to select multiple variables at the same time. All variables are correlated in pairs, each of which is correlated and the values are shown in matrix form.

‘Complete observation’ is the deletion of all rows with even one missing value, and ‘Pairwise complete observation’ is a way to use that data if there is no missing value only in each pair, even if there is a missing value. 

The plot that goes along with the Correlations is the Scatterplot.

 

Naturally, we select two continuous variables.

 


When the points are located in the same location, they need to be scattered a bit because they look like a single point, for that reason you can use jitter. Drawing a straight line of regression is an ‘Least-square line’ and the curve is a ‘Smooth line’. Expressing the range that acts as an error bar around the regression line is called spread.

Occasionally, when the points are clustered towards a lower value (which is often the case), try changing the axis on a log scale.

 

Option to display points.

  

 

It can automatically show the dot corresponding to the outlier, or you can show it by taking a picture with the mouse. You can choose multiple points with the mouse and right-click to stop. The number that appears is the row name.

 

Now let’s specify group.

 

Depending on the groups variable, they are represented in different colors and shapes.

 


‘Scatterplot matrix’ allows you to select multiple variables and is a plot that matches the ‘Correlation matrix’.

 


There are a lot of options, and I recommend you try and check them out for yourself.

 

 

You can specify groups so that you can vary the color according to the group.

It is a great plot to see the correlation of several continuous variables in the early stages of a study, or to report the relationship of some of the variables selected at the end of the study. ??


 


easier R than SPSS with Rcmdr : Contents

=================================================

  • R data visualization book 2
https://tinyurl.com/R-plot-II-2  simple variables
https://tinyurl.com/R-plot-II-3-4   many variables / map
https://tinyurl.com/R-plot-II-5-6   time related / statistics related
https://tinyurl.com/R-plot-II-7-8   others / reactive chart 
 

 

댓글 없음:

댓글 쓰기