To get the corpus and replicate the results in the paper, run the following set of commands (note this will take a very long time):

mkdir data
pipeline.sh 2BDUD2G1PqI female data/2BDUD2G1PqI 0 540
pipeline.sh 5FBrYC7bJjE female data/5FBrYC7bJjE 367 575
pipeline.sh 6DmJxgAxflE female data/6DmJxgAxflE 0 600
pipeline.sh 7BKlmHapfv0 female data/7BKlmHapfv0 0 220
pipeline.sh 9BnNorvNQP4 female data/9BnNorvNQP4 90 390
pipeline.sh CE67ZsjI99Y female data/CE67ZsjI99Y 110 650
pipeline.sh m_NaCgo2u2k female data/m_NaCgo2u2k 0 330
pipeline.sh pylrtjJL66g female data/pylrtjJL66g 345 680
pipeline.sh qy3DgjOFpqg female data/qy3DgjOFpqg 140 325
pipeline.sh rhCkfSCjEPk female data/rhCkfSCjEPk 0 600
pipeline.sh uJc2ckk4Jnw female data/uJc2ckk4Jnw 10 310
pipeline.sh y5g33lWsxys female data/y5g33lWsxys 0 600
pipeline.sh Yrw1hBy_fUs female data/Yrw1hBy_fUs 20 200

then use the concatenate_dataframes.py script to combine all the data.out files thusly produced. You can then read them into R as a dataframe and run whatever statistical analyses you'd like (we used the lme4 and lmerTest packages)