Ling236 Homework 6

Due: Wed, Feb 27, 2002

Help Joan and Chris out with their research project! In this assignment, you will do hands on analysis of some data in order to produce stochastic OT models of the data. To do this, you'll need some software to work with Stochastic OT models. There are two programs available to do this. One is a Windows program by Bruce Hayes: OTSoft at http://www.linguistics.ucla.edu/people/hayes/otsoft/. I have no experience of it. The other is Paul Boersma's Praat, available at: http://www.fon.hum.uva.nl/praat/. It's available for Mac/Windows/Linux/Solaris, and a few other things. It's what I've used, and what I'll assume in these instructions. You can download it from the webpage above. Installing it is straightforward (Windows: you just get a program you can run; it doesn't appear in the Start menu.) For general information on how to use Praat for OT grammars, look at the OT learning tutorial available in the help of the program or at: http://www.fon.hum.uva.nl/praat/manual/OT_learning.html . Indeed, in a perfect world, you might do that tutorial first before continuing with these instructions. In an imperfect world, you can probably continue along here without further ado.

First of all, download the Praat software, and install it. You'll get an executable Praat program. Fire it up. It should open a Praat objects window and a Praat picture window. We'll only use the latter a little for text pictures (we're not doing spectrograms here). Next we'll need a grammar. Let's start with the grammar described in Bresnan, Dingare, and Manning (2001); henceforth BDM. You can download it here: http://nlp.stanford.edu/~manning/courses/ling236/handouts/initGram.OTGrammar. Save that file to disk, and then choose Read from File from the Praat objects menu. A bunch of buttons should then appear to the right. Choosing Edit will let you view most of the grammar contents. However, it might also be useful just to examine the file in a plain text editor (or Word). Praat files are plain text files. After some lines describing the type of object, you should find the definition of 8 constraints. They've all been given an initial ranking value of 100. After that, only visible in the text file, are 4 constraint subhierarchies, corresponding to the example ones shown in (4) and (5) in the paper. These say that these constraints have to maintain the given ranking ordering during grammar learning. After that are 4 tableaux with different inputs, a choice of active and passive output for each, and then information on which constraints each form violates. You should be able to easily verify the correctness of the 1s and 0s by looking at the formatted tableaux in the Praat Edit window.

We'll initially try relearning the grammar on the data as described in BDM. Here's the data: http://www-nlp.stanford.edu/~manning/courses/ling236/handouts/swbdPair.dist . Load it too using the Read from File menu item. You can view a pair distribution from inside Praat by using the general object Inspect button, but it's tedious. It's easier to look at it in a text editor. The file shows counts of each of the outputs (active/passive) for each input.

The tutorial teaches you a slow way to learn OT grammars by sampling from distributions. Here's the quick way. Select both these two objects (either click and drag select both, or click one and then hold down shift or ctrl and click the other). You'll see a short button menu with just two items. Click Learn. Doing this will learn a Stochastic OT model for the data in one step. Noise we can leave as 2. "Symmetric all" is the standard Gradual Learning Algorithm version. You can use the defaults for everything else as well. Click OK. Wait. (Depending on the speed of your computer, you may have to wait a bit. It's doing hundreds of thousands of simulations of ranking data.) Click on the grammar again. Click Edit. You will now see the learned grammar. The ranking values show you the learned average values for the constraints. The disharmony column shows you the effective ranking of the constraint on the last evaluation. I can't predict those, but you should notice that some differ a fair bit from the ranking value, reflecting the normally distributed noise. Hopefully, the ranking values should look roughly like Figure 3 in our paper, but they won't be exactly the same. There's randomness in the learning.

An annoying thing about Praat: the constraints are ordered by their effective ranking on the last evaluation rather than by their ranking value. In my experience, this is hardly ever what you want. With the Edit grammar window open, you can fix this by choosing the Evaluate (zero noise) option on the Edit menu. You will now have things sorted by the ranking values, since the effective values of the last evaluation will be the same as the ranking values, as no noise was used. Also try selecting Evaluate (noise 2.0) a few times. Unless you are very lucky, actives will win for all four inputs every time (since actives are so much more common than passives for this data), but you should see the order of the constraints move around a little, reflecting the different random evaluation points of the constraints.

Question 1: Report the ranking values of the constraints you found for this data.

We can examine what predictions are made by the grammar by producing optimal outputs for the inputs, where we add the noise randomly at each evaluation. (Statisticians colorfully refer to this as "Monte Carlo estimation".) Click on the initGrammar, and choose To output Distributions. Change the Trials per Input to 10000, and then click OK. A new output distribution object will be created and selected. You can see it by choosing Draw as Numbers, and clicking OK. Results will appear in the Praat picture window. The display might be imperfect (columns overlapping slightly). It was for me.

Question 2: How well does the learned grammar produce a distribution like the data? For each individual input, we can test this with a 2x2 table. Do this for the 4 outputs. Do the results seem reasonable? Are they sufficiently close that the null hypothesis cannot be disconfirmed at, say, p = 0.01. (Aside: Here, there doesn't seem a good way to do things all in one table, since we've done the real data as multinomial sampling over all cells, but for the generated data, we've done binomial sampling of 10000 observations for each input. But it seems kosher to evaluate both of them as if binomial.)

Question 3: Try learning the grammar on the same data a couple more times. Does the grammar change much? Has the learning algorithm converged on a solution?

Question 4: Are all the constraints doing necessary work here? Looking at the learned grammar, one might be suspicious that the Obj constraints (*Obj/1,2 and *Obj/3) aren't doing much useful. By editing initGram.OTGrammar, make a variant grammar that deletes them: Change things to say "6 constraints" delete the two Obj constraints, renumber the other constraints, make it 3 fixed rankings, delete 7 and 8, and delete the 3^rd and 4^th constraint's evaluation (0 or 1) from all outputs for all inputs. Save the new grammar as sixGram.OTGrammar. Load this grammar. Train it on the same Switchboard data. Examine the grammar. Examine its outputs (before Draw-ing a new set of output numbers, you will find it useful to do Edit | Erase All in the Praat picture window). Does the new grammar predict the data as well as the old grammar?

Question 5: The universal claims of optimality theory revolve around a typological space of languages where constraints are generally assumed to come in pairs such as *S/3 and *S/1,2. In some cases these can be ordered differently by different grammars, whereas in other cases putatively universal prominence scales will give fixed orderings across languages. However, from the point of simply modeling the data, do we need all those constraints? Let us try keeping just the *Obl/1,2, *Su/Pt, and *Su/3 constraints. State why each of these constraints seems vital in terms of modeling the phenomena at hand. (I.e., what does each capture about the data?)

Question 6: Edit the grammar file again, and keep only these constraints, change it to say "0 fixed rankings", delete all the fixed rankings below that line, and edit the tableaux to delete the columns relating to the deleted constraints. It's useful to look at the grammar with Edit and to check the tableaux to make sure that all the constraint violations are indeed still in the right places. Train this grammar on the Switchboard data. Make sure you try training it multiple times. Does this grammar converge? Does it predict the observed data well? Okay, I'll give the game away. This grammar has problems. Look carefully at the grammar in the Edit window. Why does it have problems? Why can't this grammar model the data?

Question 7: This is a understanding-increasing bonus question. You don't have to do this. Is there a middle ground? Is 6 constraints the minimum number to get this grammar to work? Can one make do with less constraints? Did I just choose the wrong 3? Or can you do it with 4 or 5?

Christopher Manning

Last modified: Tue Feb 19 23:17:12 PST 2002