Gender Classification of Literary Works

By Zoe Abrams, Mark Chavira, Dik Kin Wong

CS224 Final Report

Abstract: Our project uses machine learning techniques to classify literary works according to the genders of their authors. The NLP techniques employ four methods of feature selection and three variants of Naïve Bayes. Although not our primary focus, we also applied the same techniques to classify the works according to the nationalities of their authors, either American or English.

1 Introduction and Related Work

The question of whether or not one can determine an author’s gender from his or her writing is a longstanding controversy. Virginia Woolf, one of the authors from our corpus, states that truly great writing is androgynous:

"The very first sentence that I would write here, I said, crossing over to the writing-table and taking up the page headed Women and Fiction, is that it is fatal for any one who writes to think of their sex. It is fatal to be a man or woman pure and simple; one must be woman-manly or man-womanly… Some collaboration has to take place in the mind between the woman and the man before the act of creation can be accomplished. Some marriage of opposites has to be consummated. The whole of the mind must lie wide open if we are to get the sense that the writer is communication his experience with perfect fullness.”

According to Woolf, great writers are able to transcend the boundaries of their gender and write works which are not characteristically male or female, but which unmistakenly capture the human experience. Others think there are inherent gender differences which necessarily influence any writer’s work.

Most likely, there are some authors who write characteristically for their gender and others whose writing is genderless. But is there concrete evidence in either direction? And although there are some works that are steeped with a “gendered” perspective, are there less blatent distinctions that can be made? Consider many of the female writers in our corpus who published under male aliases, such as Robert Burns and George Eliot, because women’s works were not considered for publication during the time they were writing. Is the femininity in their choice of words so subtle that it cannot be detected?

This is a heavily explored topic. In the Socrates library search engine at Stanford University there is an “Authorship Sex Differences” subject heading with over 100 entries. All of these entries are expository. Not a single entry is scientific or experimental. Scientific inquiry into this topic may provide evidence that the presence of personal identity in writing is inescapable. The successful application of NLP techniques would not only inform our understanding of how identity is expressed, but be an example of computation providing insight into our social reality.

2 Data

It was not easy to find a data set and even harder to find one that was labeled. In the end, we found a website that contained the text of many books and downloaded them. We then hand-classified them and performed initial experiments with this data set. We called this set “MultAuth.” MultAuth consists of most of the literary works in The Guttenberg Project database. The Guttenberg project is a project that aims to provide digitized versions of great works from various fields, in the same way as a public library provides hard copies of these works. See http://promo.net/pg for more details on The Guttenberg Project. The MultAuth data set divides as shown in Table 1.

File Category	Number of Documents
American Women	61
American Men	113
English Women	19
English Men	222
Total:	415

Table 1: The MultAuth Data Set

Appendix A shows the total list of titles and authors. The Guttenberg Project transcribed these works from original book editions. All of the documents included headers describing The Guttenberg Project, which we stripped. Book genres include poetry collections, short stories, and novels (the majority). The Guttenberg Project publishes books that are in the public domain, so most are classics written by authors from the 19^th and early 20^th centuries. In some cases, a single author corresponds to many works. For example, Charles Dickens wrote 18 out of the 222 works by English Men. Because large amounts of data from a single author can skew results, we created a second data set, which limited each author to one work. Table 2 shows the breakdown of this data set, which we called “SingAuth.”

File Category	Numer of Documents
American Women	28
American Men	62
English Women	14
English Men	96
Total	200

Table 2: The SingAuth Data Set

3 Methods

Our program takes categorized files as input and does the following:

1. Generates 1000 features to use for training and classification.

2. Produces a feature vector for each document. Each vector contains 1000 entries, one for each feature. The value in entry e of vector v is the number of times feature e occurs in document v.

3. Sends the set of feature vectors to a classifier, which trains on a subset of the vectors and attempts to classify the remaining vectors. The classifier uses ten-fold cross validation. That is, it randomizes the vectors, partitions them into ten equal-sized subsets, and iterates ten times, each time using a different subset as the test data, and the remaining subsets as the training data.

We ran many tests, each assigning a different set of values to various parameters. The following subsections describe the parameters. Refer to appendix B for an exhaustive list of our results.

3.1 Feature Selection

Our features are counts of words and symbols on the keyboard such as “;”, “,”, “<”, and “#”. We used four different techniques to select one-thousand features from the corpus. One-thousand dimensions seemed appropriate, because it is large enough to yield meaningful results and small enough not to overload our computational resources. We were able to use a large number of features, since Naïve Bayes does not suffer from the curse of dimensionality in the way that other machine learning algorithms do. In each of the techniques, we considered only features that occurred three or more times in the corpus. The four techniques follow:

Pointwise Mutual Information: For each (feature, class) pair, generate a point-wise mutual information value. Choose the 1000 features corresponding to the 1000 greatest values generated.
Average Mutual Information: For each feature, compute an average mutual information value. Choose the 1000 features yielding the 1000 greatest Average Mutual Information Values.
Chi Squared: For each (feature, class) pair, generate a Chi-Squared value. Choose the 1000 features corresponding to the 1000 greatest values generated.
Random: Choose 1000 features randomly.

3.2 Second-Level Pointwise Mutual Information

When calculating mutual information, the files that make up a class are considered as one entire mass of text. If a word, let’s say “Oz” from Lewis Carol’s The Wizard of Oz, is used extremely frequently within a single book, then the learner considers it representative of the class as a whole even though it is not really representative. Our objective is to choose features that distinguish among classes but not among books within the class. We therefore implemented a way of eliminating these book-specific features. Our approach uses second-level applications of Pointwise Mutual Information. Consider categorizing all American works by gender. First, we apply Pointwise Mutual Information three times--each time generating a list with considerably more than 1000 features, sorted by their mutual information values -- using the following parameters:

Use the American data set and use gender as the category.
Use the American men as the data set and define a category for each American book by a male author.
Use the American women as the data set and define a category for each American book by a female author.

We now have three lists of features. The first list contains features that distinguish among men and women in the entire data set. The second list allows us to distinguish among male books. The third list allows us to distinguish among female books. The idea is to remove features at the tops of lists 2 and 3 from the list 1 and then take the top 1000 features remaining in list 1.

However, there is more we must consider. Consider the following scenario:

1. "apple" is at the top of list 1, the American list, because it identifies women well.

2. "apple" is at the top of list 2, the Men list, since it exists in only a single male book.

3. "apple" is nowhere near the top of list 3, the Women list, since lots of female books contain "apple".

Even though "apple" is at the top of list 2 (the male list), we should not remove it from list 1 (the American list), since the feature identifies female books well. Our program handles this case by keeping track of why each feature in list 1 is in list 1 and removing it only when appropriate. For example, the program removes "apple" if and only if:

("apple" is in list 1, the American list, because it identifies men well) and

("apple" is at the top of list 2, the Men list)

("apple" is in list 1, the American list, because it identifies women well) and

("apple" is at the top of list 3, the Women list)

When producing list 1, the American list, Pointwise Mutual Information tells us which category each feature identifies. Therefore, this technique applies to Pointwise Mutual Information, but not as easily to Average Mutual Information. It is also possible to apply the technique to Chi-Squared, since Chi-Squared also identifies the class of each feature, but we only ran tests with Pointwise Mutual Information.

In short, our algorithm prefers a feature with the following characteristics:

has high mutual information across overall categories, which indicates that the feature can be used to tell the differences between the two categories
has low mutual information within the category, which indicates that the feature is common within the category

The first level of our implementation picks out those features with the first characteristic and the second level of our implementation removes those features without the second characteristic. The relative importance of these two characteristics is an open question which deserves exploration.

3.3 Balancing the Number of Features from Each Class

When using Pointwise Mutual Information, our program was performing extremely poorly on certain experiments. Specifically, in one of our experiments, we attempted to classify English texts according to the gender of the author. Our English texts were overwhelmingly male. Nevertheless, our classifier was classifying most of them as female. Upon examining the features more closely, we observed that many of the features pointed towards the female category. Because there were fewer women writers, the features from women’s literature were obtaining higher mutual information rankings since the probability space was more confined and therefore less distributed across many words. To improve performance results, we modified the program to allow "balancing" of the feature set. We could then specify how many male features to select and how many female features to select. On the problematic data sets, this change helped performance.

3.4 Naïve Bayes

We ran our tests using three variants of Naïve Bayes: Unomial, Multi-Variate Bernoulli, and Multinomial. Suppose we wish to calculate the probability that a document d belongs to a class c. Let F be the set of features. Naïve Bayes computes the probability according to the following formula:

P(c|d) = P(c) * product_f_in_F [P(f|c)]

The three variants differ only in how they compute P(f|c). Assuming n is the number of times f occurs in d, that #(f) is the number of times f occurs in the training data in class c and that #(F) is sum_f_in_F (#(f)), the three variants compute P(f|c) as follows:

Variant	P(c\|f) if #(f) = 0	P(c\|f) if #(f) > 0
Unomial	1	#(f)/#(F)
Bernoulli	(#(F) - #(f))/#(F)	#(f)/#(F)
Multinomial	1	(#(f)/#(F))^n

The classifiers use Laplace smoothing to eliminate zero probabilities.

4 Testing

We divided the data set into seven subsets and performed experiments with each, using ten-fold cross-validation:

1. AllGen – all files categorized according to gender.

2. AmerGen – files written by American authors categorized according to gender.

3. EngGen - files written by English authors categorized according to gender.

4. AllGenNat – all files categorized as either female American, female English, male American, or male English author.

5. AllNat – all files categorized according to nationality.

6. MaleNat – files written by male authors categorized according to nationality.

7. FemaleNat – files written by female authors categorized according to nationality.

In addition, for each of the seven subsets, we used various feature selection methods and various variants of the Naive Bayes learner/classifier discussed above.

5 Results

We calculated baseline scores for each experiment by always choosing the category corresponding to the highest prior probability. For instance, in the “SingAuth” data, there are 158 male authors and 42 female authors. An algorithm that always chooses men would be correct 79% of the time, so baseline is considered to be 79%. We were able to outperform baseline results. We first list results for the original MultAuth data set. We then list results for the SingleAuth data set.

5.1 MultAuth Data Set

Subset / Category Set	Percentage Categorized Correctly	Baseline
AllGen	48.21	79
AmerGen	74.1379	69
EngGen	27.3859	87
AllGenNat	53.494	48
AllNat	61.2048	55
MaleNat	46.5672	61
FemaleNat	82.5	67

Table 3: MultAuth Results

For the MultAuth data set, we used only Pointwise Mutual Information and the Unomial variant of Naive Bayes. We used neither second level feature selection nor balancing. As
Table 3 demonstrates, the results were poor. Three Subsets/Category Sets performed below baseline. Our extremely low performance was due in part to the existence of multiple books by a single author. This condition led to the use of features that relied heavily on specific authors and were not good indicators for works written by other authors in the same category.

5.2 Samplling of SingAuth Data Set With Neither 2nd Level Feature Selection Nor Balancing

Figure 1: Multinomial Naive Bayes Results

Figure 2: Unomial Naive Bayes Results

Figure 1 and Figure 2 show some of our results for the SingAuth data set. These results use neither second level feature selection, nor balancing. We omit the graph for the Bernoulli variant of Naive Bayes, because it performs almost identically to the Unomial variant. In general, our best results outperformed the baseline by roughly 20%.

The AllGenNat experiment performed worst because it is the most difficult problem. Since it is not a binary categorization, one would expect lower performance. However, even in this data/category set, we improved on the baseline significantly.

Most of the time, Multinomial performs worse than Unomial. This result is surprising, since Multinomial is generally considered to give slightly better results than Unomial. One reason that Unomial may have outperformed Multinomial is that our counts are not normalized for document length. Because our data is skewed, the presence of a word is perhaps more meaningful than the number of occurrences of that word. The multinomial method is sensitive to counts, while the Unomial method is sensitive only to the presence or absence of a feature. (We would have liked to try normalizing counts, but we ran out of time).

As expected, Random feature selection performed approximately 10% below the average of other methods. It performed above baseline because it still used priors and utilized additional information from 1000 features, just not the most informative ones.

Average mutual information performed better than pointwise. This result owes to the difficulties with pointwise mutual information ‘pointing’ to the women category. Average Mutual Information helps ensure that there are features that point to each class in the set of classes. Maximizing the average finds features that tend to be high in all categories rather than just one, so strong performance in the women’s category does not dominate as strongly.

5.3 Sampling of SingAuth Results with 2nd Level Feature Selection

Figure 3: Second Level Feature Selection Results

Figure 3 shows some of our results using the SingAuth data set and 2nd Level Pointwise Mutual Information feature selection. From this figure, we can clearly see how the second-level Mutual-Information helped in one problematic case. For the case of classifying the gender of English authors with the Unomial classifier, the second level technique improved the results from 48% to 81%. There are two reasons. First, because there are few English women, a single-level feature selection technique is more likely to select a feature that is used often in a single English/Woman work but which is not representative of the class as a whole. By using our second level feature selection scheme, we reduce the chance of using such a feature. Second, since the Unomial classifier considers only the presence of a word but not the number of occurrences, it amplifies the detrimental effect of the non-representative feature. Multinomial compensates for the use of the non-representative feature by amplifying the effects of the good features. It is noteworthy that the second level technique successfully removes a lot of the proper names from the English/Women category, including Marianne, Rachel, Derrick, Julius, Walter, Helen, Elizabeth, and Ellen.

5.4 Sampling of SingAuth Results with Both 2nd Level Feature Selection and Balancing

Figure 4: Second Level and Balancing Feature Selection Results

Figure 4 shows some of our results using the SingAuth data set, 2nd Level Pointwise Mutual Information feature selection, and balancing. Balancing the number of "men" and "women" features greatly improved the performance of classifying English authors' according to gender using the Unomial classifier, but for a different reason. Without balancing, most of the features selected were female features, because the small number of English women writers tended to make the mutual information scores of the women features large. By balancing the features in our feature selection scheme, our program picked more male features than it would have without balancing. The Unomial classifier was more sensitive to the "woman bias" problem, so its performance improved the most.

Appendix B lists many of our results in more detail, including results achieved using different combinations of the feature selection techniques. In these results, we also forced the use of different proportions (other than 1/2, 1/2) of male and female features, hoping to find a good distribution.

5.5 Observations

To give a sampling of the types of features our feature selection methods chose, we list below some of the features that Pointwise Mutual Information chose for the American Gender tests.

List of Features Pointing to Women: she, her, "‘", ".", "“", t, I , you, Duane, s (possessive), Linda, Jo, it, misses, Ruth, little, had, has, have, was, when , ? , if, bud, David , an, Meg, with, Don, Amy, dear, sort, think, beauty, beautiful, lovely, wonderful, handsome, competition, music, dances, work, mind, chess, girlish, married, marriage, child, children, gun, friends, morbid, jealous, supercilious, savory, charming, bitter, pleasant, anger, forgive, tomorrow, yesterday, time, sometime, teatime, re, relationship, maybe, perhaps.

List of Features Pointing to Men: the, of, "-", ";", his, "--", in, de, we, "!", ",", by, und, he, <, >, man, upon, believe, kill, killed, skill, competitor, warriors, warrior, war, mystery, police, science, musical, art, dance, money, toil, god, rational, sex, us, spirit, spiritual, religion, theology, baseball, football, grave, unsavory, sagacious, rage, furious, organized, systematic, prayer, now, immediately, memory, times, sport , mayhap, verily, anon, peradventure, methinks, ocean, boat, ship, oar, mast, sail.

These lists reinforce many gender stereotypes. There are topical distinctions. Women write about topics that use words such as marriage, children, cooking, and kitchen. Men write about topics that use words such as war, science, sport, and money. Women use words that reflect on time passing while men are in the ‘here’ and ‘now’. The presence of “he” and “his” in the men's list and “she” and “her” in the women’s list suggests authors write more about characters of their own gender.

There are some trends that reflect less overt differences. The presence of “m”, “d”, single quote, and “re” in the women’s list suggest they use contractions more often. The solitary “s” is likely the frequent use of the possessive. The presence of “.” may also indicates that women use shorter sentences, since a larger percentage of tokens are periods in the female sets. This conclusion is reinforced in the men's list by the presence of many punctuation symbols which tend to elongate sentences, such as “;” and “,” (it might not be a bad idea to add average sentence length and average word length as features in future work). There are more proper names in the women’s list, perhaps reflecting that women initially tended to use the novel format, or that women focus more on the main character of their novels by directly referring to them in the third person. Women use the present participle often, whereas there is not a single present participle in the first 1000 features on the men’s list.

Our data set is certainly not perfect. The list of Men’s features has more antiquated words (e.g. “anon” and “methinks”). This is likely the outcome of a data set that is especially dominated by men in the earlier works. The few women writers are from after the 18^th century because women authors were considered more acceptable, and publishable, as time progressed.

6 Ideas For Future Research

There were several possible extensions that we did not have time to implement. We considered varying the number of features used and the types of features used. We could have eliminated all capitalized words to purge the feature list of proper names that do not reflect the class. Different features might have been included, such as average word length, average sentence length, and grammatical sentence structures.

We could have normalized the counts according to document length. Alternatively, we might have taken the first N words from every document or a random sampling of N words. Another consideration concerning the data is the presence of different format genres. For example, we might have improved performance if we eliminated poetry from our data set.

Our data set included older works. It would be interesting to see if distinctions based on gender have blurred more recently.

7 Conclusion

There are many further ideas for exploration in this area. Essentially, in our project, we demonstrated that, for our data set, there are differences between male and female writing and between English and American writing. Although our data set is not ideal, our results provide evidence that there are inherent differences in general. However, our data set works with older literature. Experiments with more recent literature might provide insight into how gender differences have changed over the years.

Appendix A: Data Set

A not so-well formatted list of our MultAuth data set follows. The SingAuth data set simply chooses one book per author from the MultAuth data set.

American Literature

the second book of modern verse ed rittenhouse

the little book of modern verse ed rittenhouse

little women by louisa may alcott

flower fables by louisa may alcott

the story of a bad boy by thomas bailey aldrich

cast upon the breakers by horatio alger

frank s campaign/farm & camp horatio alger jr

the scouts of the valley by joseph a altsheler

fantastic fables by ambrose bierce

the secret garden by frances hodgson burnett

extracts from adam s diary by mark twain

life on the mississippi by mark twain

tom sawyer detective mark twain

a horse s tale by mark twain

man that corrupted hadleyburg by mark twain

the pathfinder by james fenimore cooper

life in the iron mills by rebecca harding davis

miss civilization by richard harding davis

vera the medium by richard harding davis

the reporter who made himself king by davis

culprit fay and other poems joseph rodman drake

the damnation of theron ware by harold frederic

the market place by harold frederic

copy cat & other stories by mary wilkins freeman

the yates pride by mary e wilkins freeman

the yellow wallpaper by charlotte perkins gilman

herland

the ways of men by eliot gregory

worldly ways and byways by eliot gregory

selected stories by bret harte

chita a memory of last island by lafcadio hearn

the altar of the dead by henry james

the figure in the carpet by henry james

an international episode by henry james

the lesson of the master by henry james

roderick hudson by henry james

the death of the lion by henry james

the country of the pointed firs sarah orne jewett

select poems of sidney lanier ed callaway

the breitmann ballads by charles g leland

blix by frank norris

moran of the lady letty by frank norris

the burial of the guns by thomas nelson page

the gentle grafter by o henry

heart of the west by o henry

roads of destiny by o henry

howard pyle s book of pirates

twilight land by howard pyle

initials only by anna katharine green

the woman in the alcove by anna katharine green

charlotte temple by susanna rowson

poems patriotic religious etc by father ryan

the lady or the tiger? by frank r stockton

rudder grange by frank r stockton

monsieur beaucaire by booth tarkington

penrod and sam by booth tarkington

the turmoil a novel by booth tarkington

beauty and the beast etc by bayard taylor

fisherman s luck by henry van dyke

the ruling passion by henry van dyke

ben hur a tale of the christ by lew wallace

the birds christmas carol kate douglas wiggin

a cathedral courtship by kate douglas wiggin

the diary of a goose girl by wiggin

new chronicles of rebecca by kate douglas wiggin

the old peabody pew by kate douglas wiggin

penelope s experiences in scotland by wiggin

penelope s postscripts by kate douglas wiggin

penelope s irish experiences by kate d wiggin

rose o the river by kate douglas wiggin

story of waitstill baxter by kate d wiggin

the village watch tower by kate douglas wiggin

the jimmyjohn boss and other stories by wister

lady baltimore by owen wister

lin mclean by owen wister

padre ignacio by owen wister

the outlet by andy adams

winesburg ohio by sherwood anderson

dorothy and the wizard in oz by l frank baum

the enchanted island of yew by l frank baum

the emerald city of oz l frank baum

glinda of oz by l frank baum

the lost princess of oz by baum

rinkitink in oz by l frank baum

the magic of oz by l frank baum

ozma of oz by l frank baum

the patchwork girl of oz by l frank baum

the road to oz by l frank baum

the scarecrow of oz by l frank baum

tik tok of oz by l frank baum

the tin woodman of oz by baum

the agony column by earl derr biggers

the land that time forgot by burroughs

the outlaw of torn by edgar rice burroughs

tarzan the untamed by edgar rice burroughs

out of time s abyss edgar rice burroughs

pigs is pigs by ellis parker butler

alexander s bridge by willa cather

my antonia by willa cather

song of the lark willa cather

cobb s anatomy by irvin s cobb

a plea for old cap collier by irvin s cobb

speaking of operations by irvin s cobb

the financier by theodore dreiser

lahoma by john breckinridge ellis

songs for parents by john farrar

emma mcchesney & co by edna ferber

betty zane by zane grey

the call of the canyon by zane grey

the last of the plainsmen by zane grey

the lone star ranger by zane grey

the spirit of the border by zane grey

to the last man by zane grey

wildfire by zane grey

the young forester by zane grey

a heap o livin by edgar a guest

just folks by edgar a guest

trees and other poems joyce kilmer

keziah coffin by joseph c lincoln

on the makaloa mat/island tales jack london

smoke bellew by jack london

men women and ghosts by amy lowell

sword blades and poppy seed by amy lowell

the haunted bookshop by christopher morley

where the blue begins by christopher morley

a mountain woman by elia w peattie

painted windows by elia w peattie

just david by eleanor h porter

freckles by gene stratton porter

her father s daughter by gene stratton porter

the vision splendid by william macleod raine

lavender and old lace by myrtle reed

the poisoned pen by arthur b reeve

the amazing interlude by mary roberts rinehart

bab a sub deb by mary roberts rinehart

dangerous days by mary roberts rinehart

the man in lower ten by mary roberts rinehart

a poor wise man by mary roberts rinehart

sight unseen by mary roberts rinehart

the street of seven stars by mary roberts rinehart

when a man marries by mary roberts rinehart

children of the night by edwin arlington robinson

the man against the sky by edwin a robinson

cabin fever by b m bower

cow country by b m bower

the flying u ranch by b m bower

the flying u s last stand by b m bower

the lure of the dim trails by b m bower

the trail of the white mule by b m bower

damaged goods by upton sinclair from les avaries

the darrow enigma by melvin l severy

love songs by sara teasdale

daddy long legs by jean webster

dear enemy by jean webster #2

the glimpses of the moon by edith wharton

house of mirth by edith wharton

the reef by edith wharton

the touchstone by edith wharton

arizona nights by stewart edward white

the riverman by stewart edward white

other things being equal by emma wolf

wild justice by ruth m sprague

erewhon revisited by samuel butler

mudfog and other sketches by charles dickens

mugby junction by charles dickens

liber amoris or the new pygmalion by wm hazlitt

de profundis by oscar wilde

gathering of brother hilarius by michael fairless

English Literature

adventures among books by andrew lang

flower of the mind by alice meynell

cavalier songs & ballads of england mackay/editor

ancient poems ballads and songs of england

old christmas by washington irving

the coxon fund by henry james

glasses by henry james

the jew of malta by christopher marlowe

venus and adonis by william shakespeare

sir thomas more shakespeare apocrypha

the two noble kinsmen shakespeare apocrypha

beautiful stories from shakespeare by e nesbit

the life of john bunyan by edmund venables

the double dealer by william congreve

love for love by william congreve

the old bachelor by william congreve

dickory cronke by daniel defoe

journal of a voyage to lisbon by henry fielding

she stoops to conquer by oliver goldsmith

the lucasta poems by richard lovelace

poetical works by john milton

school for scandal by richard brinsley sheridan

battle of the books et al by jonathan swift

a modest proposal by jonathan swift

the castle of otranto by horace walpole

love and friendship et al by jane austen

sense and sensibility by jane austen

peter pan in kensington gardens by j m barrie

the provost by john galt

derrick vaughan novelist by edna lyall

we two by edna lyall

many voices by e nesbit

the romany rye by george borrow

letters of george borrow

life of robert browning by william sharp

poems and songs of robert burns

sartor resartus by thomas carlyle

the ballad of the white horse by gk chesterton

a miscellany of men by g k chesterton

the frozen deep by wilkie collins

the haunted hotel by wilkie collins

the law and the lady by wilkie collins

miss or mrs? by wilkie collins

the little lame prince by miss mulock

the puzzle of dickens s last plot by andrew lang

the battle of life by charles dickens

the cricket on the hearth by charles dickens

doctor marigold by charles dickens

the holly tree by charles dickens

hunted down by charles dickens

the lamplighter by charles dickens

lazy tour of two idle apprentices by dickens

master humphrey s clock by charles dickens

perils of certain english prisoners by dickens

reprinted pieces by charles dickens

the seven poor travellers by charles dickens

somebody s luggage by charles dickens

speeches literary & social by charles dickens

sunday under three heads by charles dickens

tom tiddler s ground by charles dickens

to be read at dusk by charles dickens

the uncommercial traveller by charles dickens

wreck of the golden mary by charles dickens

phantasmagoria and other poems by lewis carroll

tales of terror & mystery arthur conan doyle

the lost world/arthur conan doyle

memoirs of sherlock holmes arthur conan doyle

the sign of the four by arthur conan doyle

the white company by arthur conan doyle

the absentee by maria edgeworth

the daughter of an empress by louise muhlbach

murad the unlucky etc by maria edgeworth

the annals of the parish john galt

the ayrshire legatees by john galt

more bab ballads by w s gilbert

songs of a savoyard by w s gilbert

diary of a nobody by george and weedon grossmith

allan quatermain by h rider haggard

child of storm by h rider haggard

finished by h rider haggard

montezuma s daughter by h rider haggard

nada the lily by h rider haggard

return of the native by thomas hardy

a pair of blue eyes by thomas hardy

the woodlanders by thomas hardy

dolly dialogues by anthony hope

tom brown s school days by thomas hughes ]

idle thoughts of an idle fellow jerome k jerome

diary of a pilgrimage by jerome k jerome

novel notes by jerome k jerome

paul kelver by jerome k jerome

second thoughts of an idle fellow by jerome

three men in a boat by jerome k jerome

told after supper by jerome k jerome

the water babies by charles kingsley

verses 1889 1896 by rudyard kipling

rewards and fairies by rudyard kipling

the second jungle book by rudyard kipling

ban and arriere ban by andrew lang

grass of parnassus by andrew lang

a monk of fife by andrew lang

new collected rhymes by andrew lang

travels in england by paul hentzner

rhymes a la mode by andrew lang

last days of pompeii edward george bulwer lytton

rienzi last of the roman tribunes by e b lytton

lays of ancient rome by thomas babbington macaulay

the princess and curdie by george macdonald

the princess and the goblin by george macdonald

masterman ready by captain marryat

a reading of life other poems by george meredith

the colour of life by alice meynell

later poems by alice meynell

penelope s english experiences by kate d wiggin

the spirit of place et al by alice meynell

child christopher by william morris

the grey room by eden phillpotts

the king of the golden river by john ruskin

sesame and lilies by john ruskin

bride of lammermoor by sir walter scott

chronicles of the canongate by walter scott

kenilworth by walter scott

misalliance by george bernard shaw

an unsocial socialist by george bernard shaw

life of john sterling by thomas carlyle

the black arrow by robert louis stevenson

catriona (kidnapped2) by robt l stevenson

the ebb tide by r l stevenson and l osbourne

island nights entertainments by stevenson

master of ballantrae by robert louis stevenson #38

merry men by robert louis stevenson

new arabian nights by robert louis stevenson

weir of hermiston by r l stevenson

st ives by robert louis stevenson

the wrecker by stevenson and osbourne

the wrong box by stevenson & osbourne

ediurgh picturesque notes by stevenson

the art of writing robert louis stevenson

essays of travel by robert louis stevenson

familiar studies of men & books by stevenson

memories and portraits by r l stevenson

travels with a donkey in the cevennes rls

virginibus puerisque by robert l stevenson

moral emblems by robert louis stevenson

a child s garden of verses/robert louis stevenson

underwoods by robert louis stevenson

vailima letters by robert l stevenson

deirdre of the sorrows by j m synge

riders to the sea j m synge

the tinker s wedding by j m synge

the well of the saints by j m synge

idylls of the king by alfred lord tennyson

catherine a story by william thackeray

the great hoggarty diamond by thackeray

men s wives by william makepeace thackeray

the rose and the ring by thackeray

sister songs by francis thompson

the city of dreadful night by james thomson

the prime minister by anthony trollope

the warden by anthony trollope

the door in the wall et al by h g wells

the first men in the moon by h g wells

the research magnificent by h g wells

the new machiavelli by h g wells

secret places of the heart by h g wells

soul of a bishop by h g wells

tono bungay by h g wells

twelve stories and a dream by h g wells

wheels of chance/bicycling idyll by h g wells

the world set free by h g wells

the memoirs of a minister of france by stanley weyman

a gentleman of france by stanley weyman

the house of the wolf by stanley weyman

selected poems of oscar wilde

selected prose of oscar wilde

charmides and other poems by oscar wilde

the duchess of padua by oscar wilde

a house of pomegranates by oscar wilde

an ideal husband by oscar wilde

intentions by oscar wilde

lady windermere s fan by oscar wilde

lord arthur savile s crime etc by oscar wilde

essays and lectures by oscar wilde

a woman of no importance by oscar wilde

the roadmender by margt

the ninth vibration et al by adams beck

seven men by max beerbohm

the works of max beerbohm by max beerbohm

greenmantle by john buchan

the path of the king by john buchan

prester john by john buchan

the thirty nine steps by john buchan

the cruise of the cachalot by frank t bullen

song book of quong lee of limehouse thomas burke

messer marco polo by brian oswald donn byrne

secret adversary by agatha christie

almayer s folly by joseph conrad

falk by joseph conrad

notes on life and letters by joseph conrad

within the tides by joseph conrad

some reminiscences by joseph conrad

amy foster by joseph conrad

my lady caprice by jeffrey farnol

country sentiment by robert graves

dead men tell no tales by e w hornung

raffles further adventures by e w hornung

a thief in the night by e w hornung

crome yellow by aldous huxley

a voyage to arcturus by david lindsay

over the sliprails by henry lawson

the lodger by marie belloc lowndes

in flanders fields by john mccrae

the great god pan by arthur machen

moon and sixpence by somerset maugham

the brother of daphne by dornford yates

the red house mystery by a a milne

martin hyde the duke s messenger by john masefield

the uearable bassington h h munro (saki)

the illustrious prince by e phillips oppenheim

kingdom of the blind by e phillips oppenheim

peter ruff and the double four by oppenheim

the vanished messenger by e phillips oppenheim

the voice of the city by o henry

whirligigs by o henry

the yellow crayon by e phillips oppenheim

the zeppelin s passenger by e phillips oppenheim

saltbush bill j p by a b banjo paterson

elizabeth and her german garden by elizabeth

captain blood by rafael sabatini

mistress wilding by rafael sabatini

scaramouche by rafael sabatini

ballads of a bohemian by robert w service

rhymes of a red cross man by robert w service

rhymes of a rolling stone by robert w service

kai lung s golden hours by ernest bramah

the mirror of kong ho by ernest bramah

the wallet of kai lung by ernest bramah

the crock of gold by james stephens

lair of the white worm by bram stoker

fire tongue by sax rohmer

the insidious dr fu manchu by sax rohmer

the yellow claw by sax rohmer

under the red robe by stanley weyman

piccadilly jim by pelham grenville wodehouse

something new by p g wodehouse

the voyage out by virginia woolf

gambara by honore de balzac

the pool in the desert sara jeannette duncan

the white moll by frank l packard

dream life and real life by olive schreiner

the story of an african farm by olive schreiner

an anthology of australian verse bertram stevens

Appendix B: Results Listing

Results for Multiple Authors Dataset

Data Set	Cat Set	Training Correct %	TestCorrect %
All	GenNat	53.494	53.494
All	Gender	47.494	48.21
All	Nationality	63.6145	61.2048
American	Gender	82.7586	74.1379
Boy	Nationality	47.7612	46.5672
English	Gender	25.7261	27.3859
Girl	Nationality	85	82.5

Results for Single Authors Dataset

Data Set	Cat Set	Feature Extraction	Classifier	Training Correct %	TestCorrect %
All	Gen	AverageMutualInformation	Multinomial	94	81
All	Gen	AverageMutualInformation	Unomial	91	84
All	Gen	AverageMutualInformation	Bournoulli	91	84
All	Gen	ChiSquared	Multinomial	94.5	83.5
All	Gen	ChiSquared	Unomial	70.5	67.5
All	Gen	ChiSquared	Bournoulli	70.5	67.5
All	Gen	PointwiseMutualInformation	Multinomial	92.5	81
All	Gen	PointwiseMutualInformation	Unomial	69.5	66.5
All	Gen	PointwiseMutualInformation	Bournoulli	69.5	66.5
All	Gen	Random	Multinomial	84	63
All	Gen	Random	Unomial	92.5	77.5
All	Gen	Random	Bournoulli	92.5	77.5
All	GenNat	AverageMutualInformation	Multinomial	85.5	61.5
All	GenNat	AverageMutualInformation	Unomial	83.5	67
All	GenNat	AverageMutualInformation	Bournoulli	83.5	67
All	GenNat	ChiSquared	Multinomial	86.5	63.5
All	GenNat	ChiSquared	Unomial	84	69
All	GenNat	ChiSquared	Bournoulli	84	69
All	GenNat	PointwiseMutualInformation	Multinomial	85	60
All	GenNat	PointwiseMutualInformation	Unomial	78	66
All	GenNat	PointwiseMutualInformation	Bournoulli	78	66
All	GenNat	Random	Multinomial	77	44
All	GenNat	Random	Unomial	87.5	54.5
All	GenNat	Random	Bournoulli	88	54.5
All	Nat	AverageMutualInformation	Multinomial	88	73.5
All	Nat	AverageMutualInformation	Unomial	88.5	85.5
All	Nat	AverageMutualInformation	Bournoulli	88.5	85.5
All	Nat	ChiSquared	Multinomial	87.5	73.5
All	Nat	ChiSquared	Unomial	89	85.5
All	Nat	ChiSquared	Bournoulli	89	85.5
All	Nat	PointwiseMutualInformation	Multinomial	84.5	73
All	Nat	PointwiseMutualInformation	Unomial	81	77
All	Nat	PointwiseMutualInformation	Bournoulli	81	77
All	Nat	Random	Multinomial	86	75
All	Nat	Random	Unomial	90.5	76.5
All	Nat	Random	Bournoulli	90.5	76.5
Amer	Gen	AverageMutualInformation	Multinomial	93.3333	82.2222
Amer	Gen	AverageMutualInformation	Unomial	95.5556	86.6667
Amer	Gen	AverageMutualInformation	Bournoulli	95.5556	86.6667
Amer	Gen	ChiSquared	Multinomial	93.3333	82.2222
Amer	Gen	ChiSquared	Unomial	94.4444	84.4444
Amer	Gen	ChiSquared	Bournoulli	94.4444	84.4444
Amer	Gen	PointwiseMutualInformation	Multinomial	93.3333	80
Amer	Gen	PointwiseMutualInformation	Unomial	88.8889	80
Amer	Gen	PointwiseMutualInformation	Bournoulli	88.8889	80
Amer	Gen	Random	Multinomial	85.5556	65.5556
Amer	Gen	Random	Unomial	96.6667	78.8889
Amer	Gen	Random	Bournoulli	96.6667	78.8889
Boy	Nat	AverageMutualInformation	Multinomial	89.8734	75.9494
Boy	Nat	AverageMutualInformation	Unomial	86.7089	81.6456
Boy	Nat	AverageMutualInformation	Bournoulli	86.7089	81.6456
Boy	Nat	ChiSquared	Multinomial	89.8734	75.9494
Boy	Nat	ChiSquared	Unomial	82.2785	74.6835
Boy	Nat	ChiSquared	Bournoulli	82.2785	74.6835
Boy	Nat	PointwiseMutualInformation	Multinomial	89.2405	75.9494
Boy	Nat	PointwiseMutualInformation	Unomial	74.0506	70.8861
Boy	Nat	PointwiseMutualInformation	Bournoulli	74.0506	70.8861
Boy	Nat	Random	Multinomial	87.9747	71.519
Boy	Nat	Random	Unomial	93.038	76.5823
Boy	Nat	Random	Bournoulli	93.038	76.5823
Eng	Gen	AverageMutualInformation	Multinomial	97.2727	81.8182
Eng	Gen	AverageMutualInformation	Unomial	92.7273	88.1818
Eng	Gen	AverageMutualInformation	Bournoulli	92.7273	88.1818
Eng	Gen	ChiSquared	Multinomial	97.2727	88.1818
Eng	Gen	ChiSquared	Unomial	47.2727	45.4545
Eng	Gen	ChiSquared	Bournoulli	47.2727	45.4545
Eng	Gen	PointwiseMutualInformation	Multinomial	96.3636	86.3636
Eng	Gen	PointwiseMutualInformation	Unomial	47.2727	47.2727
Eng	Gen	PointwiseMutualInformation	Bournoulli	47.2727	47.2727
Eng	Gen	Random	Multinomial	96.3636	72.7273
Eng	Gen	Random	Unomial	93.6364	73.6364
Eng	Gen	Random	Bournoulli	93.6364	73.6364
Girl	Nat	AverageMutualInformation	Multinomial	97.619	85.7143
Girl	Nat	AverageMutualInformation	Unomial	97.619	92.8571
Girl	Nat	AverageMutualInformation	Bournoulli	97.619	92.8571
Girl	Nat	ChiSquared	Multinomial	97.619	85.7143
Girl	Nat	ChiSquared	Unomial	90.4762	85.7143
Girl	Nat	ChiSquared	Bournoulli	90.4762	85.7143
Girl	Nat	PointwiseMutualInformation	Multinomial	95.2381	83.3333
Girl	Nat	PointwiseMutualInformation	Unomial	92.8571	90.4762
Girl	Nat	PointwiseMutualInformation	Bournoulli	92.8571	90.4762
Girl	Nat	Random	Multinomial	97.619	78.5714
Girl	Nat	Random	Unomial	97.619	83.3333
Girl	Nat	Random	Bournoulli	97.619	83.3333

Results for Single Authors Dataset comparing the second level and balancing effects

Data Set	Cat Set	Feature Extraction	Classifier	Remove	Balancing	Traning % Correct	Testing % Correct
Amer	Gen	PointwiseMutualInformation	Multinomial	0	balance	92.2222	80
Amer	Gen	PointwiseMutualInformation	Multinomial	100	balance	91.1111	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	100	nbalance	91.1111	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	150	balance	88.8889	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	150	nbalance	90	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	200	balance	90	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	200	nbalance	90	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	20	balance	92.2222	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	20	nbalance	91.1111	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	250	balance	90	84.4444
Amer	Gen	PointwiseMutualInformation	Multinomial	250	m200	90	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	250	m800	87.7778	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	250	nbalance	91.1111	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	300	balance	83.3333	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	300	m200	86.6667	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	300	m800	86.6667	77.7778
Amer	Gen	PointwiseMutualInformation	Multinomial	300	nbalance	83.3333	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	400	balance	81.1111	77.7778
Amer	Gen	PointwiseMutualInformation	Multinomial	400	m200	88.8889	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	400	m800	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Multinomial	400	nbalance	82.2222	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	500	balance	83.3333	77.7778
Amer	Gen	PointwiseMutualInformation	Multinomial	500	m200	90	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	500	m800	81.1111	78.8889
Amer	Gen	PointwiseMutualInformation	Multinomial	500	nbalance	88.8889	77.7778
Amer	Gen	PointwiseMutualInformation	Multinomial	50	balance	90	80
Amer	Gen	PointwiseMutualInformation	Multinomial	50	nbalance	91.1111	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	600	balance	87.7778	80
Amer	Gen	PointwiseMutualInformation	Multinomial	600	m200	86.6667	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	600	m800	84.4444	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	600	nbalance	87.7778	82.2222
Amer	Gen	PointwiseMutualInformation	Multinomial	700	balance	87.7778	80
Amer	Gen	PointwiseMutualInformation	Multinomial	700	m200	90	85.5556
Amer	Gen	PointwiseMutualInformation	Multinomial	700	m800	84.4444	80
Amer	Gen	PointwiseMutualInformation	Multinomial	700	nbalance	87.7778	80
Amer	Gen	PointwiseMutualInformation	Multinomial	800	balance	86.6667	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	800	m200	88.8889	85.5556
Amer	Gen	PointwiseMutualInformation	Multinomial	800	m800	83.3333	80
Amer	Gen	PointwiseMutualInformation	Multinomial	800	nbalance	87.7778	81.1111
Amer	Gen	PointwiseMutualInformation	Multinomial	900	balance	85.5556	80
Amer	Gen	PointwiseMutualInformation	Multinomial	900	m200	87.7778	84.4444
Amer	Gen	PointwiseMutualInformation	Multinomial	900	m800	83.3333	77.7778
Amer	Gen	PointwiseMutualInformation	Multinomial	900	nbalance	84.4444	80
Amer	Gen	PointwiseMutualInformation	Unomial	0	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Unomial	100	balance	86.6667	76.6667
Amer	Gen	PointwiseMutualInformation	Unomial	100	nbalance	91.1111	84.4444
Amer	Gen	PointwiseMutualInformation	Unomial	150	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Unomial	150	nbalance	92.2222	84.4444
Amer	Gen	PointwiseMutualInformation	Unomial	200	balance	85.5556	78.8889
Amer	Gen	PointwiseMutualInformation	Unomial	200	nbalance	90	86.6667
Amer	Gen	PointwiseMutualInformation	Unomial	20	balance	86.6667	74.4444
Amer	Gen	PointwiseMutualInformation	Unomial	20	nbalance	88.8889	83.3333
Amer	Gen	PointwiseMutualInformation	Unomial	250	balance	83.3333	77.7778
Amer	Gen	PointwiseMutualInformation	Unomial	250	m200	35.5556	35.5556
Amer	Gen	PointwiseMutualInformation	Unomial	250	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	250	nbalance	88.8889	87.7778
Amer	Gen	PointwiseMutualInformation	Unomial	300	balance	84.4444	78.8889
Amer	Gen	PointwiseMutualInformation	Unomial	300	m200	35.5556	35.5556
Amer	Gen	PointwiseMutualInformation	Unomial	300	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	300	nbalance	87.7778	85.5556
Amer	Gen	PointwiseMutualInformation	Unomial	400	balance	83.3333	80
Amer	Gen	PointwiseMutualInformation	Unomial	400	m200	42.2222	41.1111
Amer	Gen	PointwiseMutualInformation	Unomial	400	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	400	nbalance	86.6667	82.2222
Amer	Gen	PointwiseMutualInformation	Unomial	500	balance	82.2222	78.8889
Amer	Gen	PointwiseMutualInformation	Unomial	500	m200	41.1111	40
Amer	Gen	PointwiseMutualInformation	Unomial	500	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	500	nbalance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Unomial	50	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Unomial	50	nbalance	88.8889	83.3333
Amer	Gen	PointwiseMutualInformation	Unomial	600	balance	86.6667	84.4444
Amer	Gen	PointwiseMutualInformation	Unomial	600	m200	40	42.2222
Amer	Gen	PointwiseMutualInformation	Unomial	600	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	600	nbalance	90	86.6667
Amer	Gen	PointwiseMutualInformation	Unomial	700	balance	86.6667	84.4444
Amer	Gen	PointwiseMutualInformation	Unomial	700	m200	42.2222	41.1111
Amer	Gen	PointwiseMutualInformation	Unomial	700	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	700	nbalance	86.6667	85.5556
Amer	Gen	PointwiseMutualInformation	Unomial	800	balance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Unomial	800	m200	43.3333	42.2222
Amer	Gen	PointwiseMutualInformation	Unomial	800	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	800	nbalance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Unomial	900	balance	90	87.7778
Amer	Gen	PointwiseMutualInformation	Unomial	900	m200	43.3333	41.1111
Amer	Gen	PointwiseMutualInformation	Unomial	900	m800	73.3333	71.1111
Amer	Gen	PointwiseMutualInformation	Unomial	900	nbalance	92.2222	87.7778
Amer	Gen	PointwiseMutualInformation	Bernoulli	0	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	100	balance	86.6667	76.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	100	nbalance	91.1111	84.4444
Amer	Gen	PointwiseMutualInformation	Bernoulli	150	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	150	nbalance	92.2222	84.4444
Amer	Gen	PointwiseMutualInformation	Bernoulli	200	balance	85.5556	78.8889
Amer	Gen	PointwiseMutualInformation	Bernoulli	200	nbalance	90	86.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	20	balance	86.6667	74.4444
Amer	Gen	PointwiseMutualInformation	Bernoulli	20	nbalance	88.8889	83.3333
Amer	Gen	PointwiseMutualInformation	Bernoulli	250	balance	83.3333	77.7778
Amer	Gen	PointwiseMutualInformation	Bernoulli	250	m200	35.5556	35.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	250	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	250	nbalance	88.8889	87.7778
Amer	Gen	PointwiseMutualInformation	Bernoulli	300	balance	84.4444	78.8889
Amer	Gen	PointwiseMutualInformation	Bernoulli	300	m200	35.5556	35.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	300	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	300	nbalance	87.7778	85.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	400	balance	83.3333	80
Amer	Gen	PointwiseMutualInformation	Bernoulli	400	m200	42.2222	41.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	400	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	400	nbalance	86.6667	82.2222
Amer	Gen	PointwiseMutualInformation	Bernoulli	500	balance	82.2222	77.7778
Amer	Gen	PointwiseMutualInformation	Bernoulli	500	m200	41.1111	40
Amer	Gen	PointwiseMutualInformation	Bernoulli	500	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	500	nbalance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	50	balance	84.4444	76.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	50	nbalance	88.8889	83.3333
Amer	Gen	PointwiseMutualInformation	Bernoulli	600	balance	86.6667	84.4444
Amer	Gen	PointwiseMutualInformation	Bernoulli	600	m200	40	42.2222
Amer	Gen	PointwiseMutualInformation	Bernoulli	600	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	600	nbalance	90	86.6667
Amer	Gen	PointwiseMutualInformation	Bernoulli	700	balance	86.6667	84.4444
Amer	Gen	PointwiseMutualInformation	Bernoulli	700	m200	42.2222	41.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	700	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	700	nbalance	86.6667	85.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	800	balance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	800	m200	43.3333	42.2222
Amer	Gen	PointwiseMutualInformation	Bernoulli	800	m800	71.1111	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	800	nbalance	90	85.5556
Amer	Gen	PointwiseMutualInformation	Bernoulli	900	balance	90	87.7778
Amer	Gen	PointwiseMutualInformation	Bernoulli	900	m200	43.3333	41.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	900	m800	73.3333	71.1111
Amer	Gen	PointwiseMutualInformation	Bernoulli	900	nbalance	92.2222	87.7778
F_Eng	Gen	PointwiseMutualInformation	Multinomial	0	balance	97.2727	84.5455
F_Eng	Gen	PointwiseMutualInformation	Multinomial	100	balance	95.4545	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	100	nbalance	96.3636	90.9091
F_Eng	Gen	PointwiseMutualInformation	Multinomial	150	balance	92.7273	88.1818
F_Eng	Gen	PointwiseMutualInformation	Multinomial	150	nbalance	91.8182	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	200	balance	92.7273	90.9091
F_Eng	Gen	PointwiseMutualInformation	Multinomial	200	nbalance	92.7273	91.8182
F_Eng	Gen	PointwiseMutualInformation	Multinomial	20	balance	96.3636	83.6364
F_Eng	Gen	PointwiseMutualInformation	Multinomial	20	nbalance	96.3636	87.2727
F_Eng	Gen	PointwiseMutualInformation	Multinomial	250	balance	92.7273	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	250	m200	90.9091	88.1818
F_Eng	Gen	PointwiseMutualInformation	Multinomial	250	m800	90	82.7273
F_Eng	Gen	PointwiseMutualInformation	Multinomial	250	nbalance	90.9091	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	300	balance	92.7273	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	300	m200	91.8182	88.1818
F_Eng	Gen	PointwiseMutualInformation	Multinomial	300	m800	91.8182	81.8182
F_Eng	Gen	PointwiseMutualInformation	Multinomial	300	nbalance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	400	balance	90.9091	90.9091
F_Eng	Gen	PointwiseMutualInformation	Multinomial	400	m200	90.9091	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	400	m800	91.8182	87.2727
F_Eng	Gen	PointwiseMutualInformation	Multinomial	400	nbalance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	500	balance	90.9091	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	500	m200	90	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	500	m800	89.0909	88.1818
F_Eng	Gen	PointwiseMutualInformation	Multinomial	500	nbalance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	50	balance	97.2727	84.5455
F_Eng	Gen	PointwiseMutualInformation	Multinomial	50	nbalance	97.2727	88.1818
F_Eng	Gen	PointwiseMutualInformation	Multinomial	600	balance	90	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	600	m200	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	600	m800	89.0909	87.2727
F_Eng	Gen	PointwiseMutualInformation	Multinomial	600	nbalance	90.9091	90.9091
F_Eng	Gen	PointwiseMutualInformation	Multinomial	700	balance	89.0909	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	700	m200	89.0909	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	700	m800	88.1818	86.3636
F_Eng	Gen	PointwiseMutualInformation	Multinomial	700	nbalance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	800	balance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	800	m200	89.0909	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	800	m800	88.1818	86.3636
F_Eng	Gen	PointwiseMutualInformation	Multinomial	800	nbalance	90	90
F_Eng	Gen	PointwiseMutualInformation	Multinomial	900	balance	90	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	900	m200	89.0909	89.0909
F_Eng	Gen	PointwiseMutualInformation	Multinomial	900	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Multinomial	900	nbalance	90	89.0909
F_Eng	Gen	PointwiseMutualInformation	Unomial	0	balance	91.8182	86.3636
F_Eng	Gen	PointwiseMutualInformation	Unomial	100	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	100	nbalance	60	55.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	150	balance	90.9091	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	150	nbalance	61.8182	60.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	200	balance	90	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	200	nbalance	70	65.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	20	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	20	nbalance	48.1818	48.1818
F_Eng	Gen	PointwiseMutualInformation	Unomial	250	balance	90.9091	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	250	m200	28.1818	30
F_Eng	Gen	PointwiseMutualInformation	Unomial	250	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	250	nbalance	66.3636	66.3636
F_Eng	Gen	PointwiseMutualInformation	Unomial	300	balance	89.0909	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	300	m200	30	30.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	300	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	300	nbalance	68.1818	70.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	400	balance	90	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	400	m200	30	30.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	400	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	400	nbalance	76.3636	77.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	500	balance	92.7273	86.3636
F_Eng	Gen	PointwiseMutualInformation	Unomial	500	m200	29.0909	30.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	500	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	500	nbalance	81.8182	80.9091
F_Eng	Gen	PointwiseMutualInformation	Unomial	50	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	50	nbalance	51.8182	51.8182
F_Eng	Gen	PointwiseMutualInformation	Unomial	600	balance	92.7273	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	600	m200	30	31.8182
F_Eng	Gen	PointwiseMutualInformation	Unomial	600	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	600	nbalance	83.6364	84.5455
F_Eng	Gen	PointwiseMutualInformation	Unomial	700	balance	90.9091	90
F_Eng	Gen	PointwiseMutualInformation	Unomial	700	m200	30	31.8182
F_Eng	Gen	PointwiseMutualInformation	Unomial	700	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	700	nbalance	84.5455	84.5455
F_Eng	Gen	PointwiseMutualInformation	Unomial	800	balance	91.8182	90
F_Eng	Gen	PointwiseMutualInformation	Unomial	800	m200	30.9091	31.8182
F_Eng	Gen	PointwiseMutualInformation	Unomial	800	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	800	nbalance	84.5455	85.4545
F_Eng	Gen	PointwiseMutualInformation	Unomial	900	balance	90.9091	88.1818
F_Eng	Gen	PointwiseMutualInformation	Unomial	900	m200	30.9091	31.8182
F_Eng	Gen	PointwiseMutualInformation	Unomial	900	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Unomial	900	nbalance	84.5455	84.5455
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	0	balance	91.8182	86.3636
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	100	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	100	nbalance	60	55.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	150	balance	90.9091	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	150	nbalance	61.8182	60.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	200	balance	90	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	200	nbalance	70	65.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	20	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	20	nbalance	48.1818	48.1818
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	250	balance	90.9091	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	250	m200	28.1818	30
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	250	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	250	nbalance	66.3636	66.3636
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	300	balance	89.0909	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	300	m200	30	30.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	300	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	300	nbalance	68.1818	70.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	400	balance	90	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	400	m200	30	30.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	400	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	400	nbalance	76.3636	77.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	500	balance	92.7273	86.3636
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	500	m200	29.0909	30.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	500	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	500	nbalance	81.8182	80.9091
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	50	balance	91.8182	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	50	nbalance	51.8182	51.8182
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	600	balance	92.7273	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	600	m200	30	31.8182
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	600	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	600	nbalance	83.6364	84.5455
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	700	balance	90.9091	90
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	700	m200	30	31.8182
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	700	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	700	nbalance	84.5455	84.5455
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	800	balance	91.8182	90
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	800	m200	30.9091	31.8182
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	800	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	800	nbalance	84.5455	85.4545
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	900	balance	90.9091	88.1818
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	900	m200	30.9091	31.8182
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	900	m800	87.2727	87.2727
F_Eng	Gen	PointwiseMutualInformation	Bernoulli	900	nbalance	84.5455	84.5455