Homework 4.
0. Look at the Cedegren's Panamanian Spanish Varbrul instructions
handout to understand the problem and the coding of the data that this
homework deals with. You aren't expected to actually do this homework
(i.e., to actually download and use Varbrul). Also, download the data,
so you can load it into R.
1. Build two logistic regression models from the Cedergren data, one
using only POS as an independent variable and the other using only
Following Environment. Which one has higher data likelihood? Which one
has higher
classification accuracy? Is this surprising? Explain why you see the
pattern you do.
2. Build a logistic regression model from the Cedergren data,
using POS, Environment, and Class as the factors. Plot predicted
versus actual deletion probability for data aggregated by the
independent variables, and examine the outliers. Describe what you
see. Compare the outliers to other datapoints. Do the outliers have
anything in common? Is there anything about these cases that explains
why these are outliers?
3. Do the beginning part of the Varbrul Cedegren homework, stopping in
the middle of question 4. That is, building a basic logistic
regression model counts as a "one level" analysis. We will examine
interaction effects and combining levels later.