Statistics for everyone: ch.2 t-test

easier R than SPSS with Rcmdr : Contents

ch.2 t-test

Let’s call up the prepared sample data.

Double-click ‘datasets’, then double-click ‘ToothGrowth’ again, and click ‘OK’. There is a wide variety of data available.

Nothing happens. Click View data set.

Only now do you see the data. Beginners may panic. Excel and other programs show something when you open the data, but R Commander doesn’t show it unless you have a command to load the data and ask it to be displayed.

Here you can click on ‘Edit data set’ and you will be able to make simple edits.

Click ‘Means’ and then ‘Independent samples t-test’.

Select two variables: one that corresponds to the group and one that corresponds to the numeric variables.

In ‘Options’ we choose whether or not to assume that the variances are equal, and let’s do both in turn.

Depending on your choice, you will see 2 results: Assuming the variances are equal, a ‘two sample t-test’ is given. On the other hand, assuming the variances are not equal, we get ‘Welch Two sample t-test’. Both are heavily used, and in some cases the Welch Two sample t-test is also called a t-test. I’ll explain what the difference is in the next chapter.

‘Welch two sample t-test’ is technically different from ‘two sample t-test’, but it can also be collectively referred to as a two sample t-test. ‘Two sample t-tests’ are also known as ‘independent t-tests’, ‘student t-tests’, or just ‘t-tests’. You could be confused because there are so many similarities, but you can just take it out of context.

In any case, note information such as the value of p and the 95% confidence interval of the difference.

Many graphs can be used for t-test, let’s first look at Boxplot.

It’s a little different, but you takes the same variables. You can click ‘Plot by groups...’ to enter a group variable.

The original RGui window displays a ‘box plot’, also known as a ‘box and whisker plot’. The center horizontality of the rectangle shows the median. The upper and lower horizontal lines of the rectangle tell the quartile values. The horizontal lines above and below the beard indicate the maximum and minimum values. In other words, the 5 signposts are represented in a single picture.

If the data forms a normal distribution, it will appear to be up and down symmetrical.

(in the RGui window, in some cases, several pictures may be drawn at once and overlapped so that the picture behind you may not be visible; once the picture is drawn, it is a good idea to look at what picture is hidden behind it, and if the letters and pictures do not seem to be in balance, try resizing the window. )

Let’s choose ‘Plot of means’. It’s a name that seems to be a plot that represents an average, but it’s not a universally commonly used name.

Select the same two variables.

The default ‘Error Bars’ is ‘Standard errors’ and is usually the most commonly used. Other options are easy to understand and you can try them out for yourself.

In the graph above, the points are the mean of the two groups, and the error bars show the standard error.

If you uncheck the ‘Connect profiles of means’ at the bottom,

There are no lines crossing in the graph, which is more suitable for the t-test. Having a crossing line is suitable for paired t-tests.

The variable that selects the ‘Strip chart’ is the same. Let’s alternate between ‘Stack’ or ‘Jitter’ in the options.

The left side is ‘jitter’ and the right side is ‘stack’. jitter means to distract, and stack means to stack up. Does it make sense as a picture? When multiple points overlap, it rather loses information, so there are times when jitter is better. After you’ve tried both, you can choose to express it better. (The various pictures you’ve seen now can match the t-test, and I’ll show you some of the better ones later.)

Let’s summarize the data.

Let’s select both variables in the same way as before.

Among the summarized data, the mean and standard deviation are of particular importance.

If you were to write a thesis, you could write it in the text:

OJ(mean = 20.7 SD = 6.6) is higher than VC(mean = 17.0, SD = 8.3) but Two Sample t-test don’t show statistical significance(t = 1.9153, df = 58, p-value = 0.06039, 95% CI -0.1670064 to 7.5670064)

easier R than SPSS with Rcmdr : Contents

=================================================