StatNotes for week 2

I'd meant to demonstrate doing things online in class, but didn't bring a long enough ethernet cable. The following notes are especially for people doing the homework, but may be of interest to others.

Here's the spreadsheet that I demonstrated in class: FishersExact-BankSupervisors.xls. It isn't necessary that you be able to make your own spreadsheet to calculate chi-square or Fisher's Exact test numbers. You can do it all using the software below. On the other hand, I think it is actually a fairly useful exercise and that you will understand a lot better what is going on if you try to do it sometime.... You can also just use this spreadsheet and plugging in different numbers to do another chi-square test. (The Fisher's exact test calculation isn't as automated, though, unfortunately.)

A site where you can do Chi-square tests and Fisher exact tests -- indeed everything needed for this homework is:

http://home.clara.net/sisa/

To do the chi-square test, use Two by Two Table. All the commands give lots of output, and the first skill is knowing where to look and how to interpret the numbers. What we covered -- and what you should use is the Pearson chi-square test. This is the one which is most commonly used, but we will cover the likelihood ratio chi-square test later (people doing Varbrul analyses standardly use it). The rest can be forgotten about.

For Fisher's Exact Test on a 2x2 table, you choose "Fisher Exact".

Note that the number shown in the Fisher's Exact Test results for "The p-value for the same or a stronger association" are -- as the sentence implies -- a one-tailed confidence value (as you should be able to confirm by putting in the numbers from the example we used in class. If you look down lower you will see a two-sided value ("the sum of small p's"). This may just be double, but it may not depending on the symmetry of the problem -- in general just doubling isn't right. The (Pearson's) chi-square test is always evaluated against the upper tail of the chi-square distribution (is it this value or larger?), and in this sense is a one-tailed test, but in terms of the original data it is working out an approximation to the two-tailed probability -- how likely is it that the data deviates this much or more from homogeneity/no association?

Christopher Manning

Last modified: Thu Jan 17 22:44:33 PST 2002