Gender Classification of Literary Works

 

By Zoe Abrams, Mark Chavira, Dik Kin Wong

CS224 Final Report

 

Abstract:  Our project uses machine learning techniques to classify literary works according to the genders of their authors.  The NLP techniques employ four methods of feature selection and three variants of Naïve Bayes.  Although not our primary focus, we also applied the same techniques to classify the works according to the nationalities of their authors, either American or English.

1 Introduction and Related Work

The question of whether or not one can determine an author’s gender from his or her writing is a longstanding controversy.  Virginia Woolf, one of the authors from our corpus, states that truly great writing is androgynous:

 

 

"The very first sentence that I would write here, I said, crossing over to the writing-table and taking up the page headed Women and Fiction, is that it is fatal for any one who writes to think of their sex.  It is fatal to be a man or woman pure and simple; one must be woman-manly or man-womanly… Some collaboration has to take place in the mind between the woman and the man before the act of creation can be accomplished.  Some marriage of opposites has to be consummated.  The whole of the mind must lie wide open if we are to get the sense that the writer is communication his experience with perfect fullness.”

 

According to Woolf, great writers are able to transcend the boundaries of their gender and write works which are not characteristically male or female, but which unmistakenly capture the human experience.  Others think there are inherent gender differences which necessarily influence any writer’s work.

 

Most likely, there are some authors who write characteristically for their gender and others whose writing is genderless.  But is there concrete evidence in either direction?  And although there are some works that are steeped with a “gendered” perspective, are there less blatent distinctions that can be made?  Consider many of the female writers in our corpus who published under male aliases, such as Robert Burns and George Eliot, because women’s works were not considered for publication during the time they were writing.  Is the femininity in their choice of words so subtle that it cannot be detected?

 

This is a heavily explored topic.  In the Socrates library search engine at Stanford University there is an “Authorship Sex Differences” subject heading with over 100 entries.  All of these entries are expository.  Not a single entry is scientific or experimental.  Scientific inquiry into this topic may provide evidence that the presence of personal identity in writing is inescapable.   The successful application of NLP techniques would not only inform our understanding of how identity is expressed, but be an example of computation providing insight into our social reality.

 

2 Data

It was not easy to find a data set and even harder to find one that was labeled.  In the end, we found a website that contained the text of many books and downloaded them.  We then hand-classified them and performed initial experiments with this data set.  We called this set “MultAuth.”  MultAuth consists of most of the literary works in The Guttenberg Project database.  The Guttenberg project is a project that aims to provide digitized versions of great works from various fields, in the same way as a public library provides hard copies of these works.  See http://promo.net/pg for more details on The Guttenberg Project.  The MultAuth data set divides as shown in Table 1.

 

File Category

Number of Documents

American Women

61

American Men

113

English Women

19

English Men

222

Total:

415

 

Table 1: The MultAuth Data Set

Appendix A shows the total list of titles and authors.  The Guttenberg Project transcribed these works from original book editions.  All of the documents included headers describing The Guttenberg Project, which we stripped.  Book genres include poetry collections, short stories, and novels (the majority).  The Guttenberg Project publishes books that are in the public domain, so most are classics written by authors from the 19th and early 20th centuries.   In some cases, a single author corresponds to many works.  For example, Charles Dickens wrote 18 out of the 222 works by English Men.  Because large amounts of data from a single author can skew results, we created a second data set, which limited each author to one work.  Table 2 shows the breakdown of this data set, which we called “SingAuth.”

 

File Category

Numer of Documents

American Women

28

American Men

62

English Women

14

English Men

96

Total

200

 

Table 2: The SingAuth Data Set

3 Methods

Our program takes categorized files as input and does the following:

 

1.       Generates 1000 features to use for training and classification.

2.       Produces a feature vector for each document.  Each vector contains 1000 entries, one for each feature.  The value in entry e of vector v is the number of times feature e occurs in document v.

3.       Sends the set of feature vectors to a classifier, which trains on a subset of the vectors and attempts to classify the remaining vectors.  The classifier uses ten-fold cross validation.  That is, it randomizes the vectors, partitions them into ten equal-sized subsets, and iterates ten times, each time using a different subset as the test data, and the remaining subsets as the training data.

 

We ran many tests, each assigning a different set of values to various parameters.  The following subsections describe the parameters.  Refer to appendix B for an exhaustive list of our results.

3.1 Feature Selection

Our features are counts of words and symbols on the keyboard such as “;”,  “,”,  “<”, and “#”.   We used four different techniques to select one-thousand features from the corpus.  One-thousand dimensions seemed appropriate, because it is large enough to yield meaningful results and small enough not to overload our computational resources.  We were able to use a large number of features, since Naïve Bayes does not suffer from the curse of dimensionality in the way that other machine learning algorithms do.  In each of the techniques, we considered only features that occurred three or more times in the corpus.  The four techniques follow:

 

  1. Pointwise Mutual Information: For each (feature, class) pair, generate a point-wise mutual information value.  Choose the 1000 features corresponding to the 1000 greatest values generated.
  2. Average Mutual Information: For each feature, compute an average mutual information value.  Choose the 1000 features yielding the 1000 greatest Average Mutual Information Values.
  3. Chi Squared: For each (feature, class) pair, generate a Chi-Squared value.  Choose the 1000 features corresponding to the 1000 greatest values generated.
  4. Random: Choose 1000 features randomly.

3.2 Second-Level Pointwise Mutual Information

When calculating mutual information, the files that make up a class are considered as one entire mass of text.  If a word, let’s say “Oz” from Lewis Carol’s The Wizard of Oz, is used extremely frequently within a single book, then the learner considers it representative of the class as a whole even though it is not really representative.  Our objective is to choose features that distinguish among classes but not among books within the class.  We therefore implemented a way of eliminating these book-specific features.  Our approach uses second-level applications of Pointwise Mutual Information.  Consider categorizing all American works by gender.  First, we apply Pointwise Mutual Information three times--each time generating a list with considerably more than 1000 features, sorted by their mutual information values -- using the following parameters:

 

  1. Use the American data set and use gender as the category.
  2. Use the American men as the data set and define a category for each American book by a male author.
  3. Use the American women as the data set and define a category for each American book by a female author.

 

We now have three lists of features.  The first list contains features that distinguish among men and women in the entire data set.  The second list allows us to distinguish among male books.  The third list allows us to distinguish among female books.  The idea is to remove features at the tops of lists 2 and 3 from the list 1 and then take the top 1000 features remaining in list 1.

 

However, there is more we must consider.  Consider the following scenario:

 

1.       "apple" is at the top of list 1, the American list, because it identifies women well.

2.       "apple" is at the top of list 2, the Men list, since it exists in only a single male book.

3.       "apple" is nowhere near the top of list 3, the Women list, since lots of female books contain "apple".

 

Even though "apple" is at the top of list 2 (the male list), we should not remove it from list 1 (the American list), since the feature identifies female books well.  Our program handles this case by keeping track of why each feature in list 1 is in list 1 and removing it only when appropriate.  For example, the program removes "apple" if and only if:

 

("apple" is in list 1, the American list, because it identifies men well) and

("apple" is at the top of list 2, the Men list)

 

or

 

("apple" is in list 1, the American list, because it identifies women well) and

("apple" is at the top of list 3, the Women list)

 

When producing list 1, the American list, Pointwise Mutual Information tells us which category each feature identifies.  Therefore, this technique applies to Pointwise Mutual Information, but not as easily to Average Mutual Information.  It is also possible to apply the technique to Chi-Squared, since Chi-Squared also identifies the class of each feature, but we only ran tests with Pointwise Mutual Information.

 

In short, our algorithm prefers a feature with the following characteristics:

 

 

The first level of our implementation picks out those features with the first characteristic and the second level of our implementation removes those features without the second characteristic. The relative importance of these two characteristics is an open question which deserves exploration.

3.3 Balancing the Number of Features from Each Class

When using Pointwise Mutual Information, our program was performing extremely poorly on certain experiments.  Specifically, in one of our experiments, we attempted to classify English texts according to the gender of the author.  Our English texts were overwhelmingly male.  Nevertheless, our classifier was classifying most of them as female.  Upon examining the features more closely, we observed that many of the features pointed towards the female category.  Because there were fewer women writers, the features from women’s literature were obtaining higher mutual information rankings since the probability space was more confined and therefore less distributed across many words.  To improve performance results, we modified the program to allow "balancing" of the feature set.  We could then specify how many male features to select and how many female features to select.  On the problematic data sets, this change helped performance.

3.4 Naïve Bayes

We ran our tests using three variants of Naïve Bayes: Unomial, Multi-Variate Bernoulli, and Multinomial.  Suppose we wish to calculate the probability that a document d belongs to a class c.  Let F be the set of features.  Naïve Bayes computes the probability according to the following formula:

 

P(c|d) = P(c) * product_f_in_F [P(f|c)]

 

The three variants differ only in how they compute P(f|c).  Assuming n is the number of times f occurs in d, that #(f) is the number of times f occurs in the training data in class c and that #(F) is sum_f_in_F (#(f)), the three variants compute P(f|c) as follows:

 

Variant

P(c|f) if #(f) = 0

P(c|f) if #(f) > 0

Unomial

1

#(f)/#(F)

Bernoulli

(#(F) - #(f))/#(F)

#(f)/#(F)

Multinomial

1

(#(f)/#(F))^n

 

The classifiers use Laplace smoothing to eliminate zero probabilities.

4 Testing

We divided the data set into seven subsets and performed experiments with each, using ten-fold cross-validation:

 

1.       AllGen – all files categorized according to gender.

2.       AmerGen – files written by American authors categorized according to gender.

3.       EngGen - files written by English authors categorized according to gender.

4.       AllGenNat – all files categorized as either female American, female English, male American, or male English author.

5.       AllNat – all files categorized according to nationality.

6.       MaleNat – files written by male authors categorized according to nationality.

7.       FemaleNat – files written by female authors categorized according to nationality.

 

In addition, for each of the seven subsets, we used various feature selection methods and various variants of the Naive Bayes learner/classifier discussed above.

5 Results

We calculated baseline scores for each experiment by always choosing the category corresponding to the highest prior probability.  For instance, in the “SingAuth” data, there are 158 male authors and 42 female authors.  An algorithm that always chooses men would be correct 79% of the time, so baseline is considered to be 79%.  We were able to outperform baseline results.  We first list results for the original MultAuth data set.  We then list results for the SingleAuth data set.

5.1 MultAuth Data Set

Subset / Category Set

Percentage Categorized Correctly

Baseline

AllGen

48.21

79

AmerGen

74.1379

69

EngGen

27.3859

87

AllGenNat

53.494

48

AllNat

61.2048

55

MaleNat

46.5672

61

FemaleNat

82.5

67


Table 3
: MultAuth Results

 

For the MultAuth data set, we used only Pointwise Mutual Information and the Unomial variant of Naive Bayes.  We used neither second level feature selection nor balancing.  As
Table 3
demonstrates, the results were poor.  Three Subsets/Category Sets performed below baseline.  Our extremely low performance was due in part to the existence of multiple books by a single author.  This condition led to the use of features that relied heavily on specific authors and were not good indicators for works written by other authors in the same category.

5.2 Samplling of SingAuth Data Set With Neither 2nd Level Feature Selection Nor Balancing

Figure 1: Multinomial Naive Bayes Results

 

Figure 2: Unomial Naive Bayes Results

 

Figure 1 and Figure 2 show some of our results for the SingAuth data set.  These results use neither second level feature selection, nor balancing.  We omit the graph for the Bernoulli variant of Naive Bayes, because it performs almost identically to the Unomial variant.  In general, our best results outperformed the baseline by roughly 20%.

 

The AllGenNat experiment performed worst because it is the most difficult problem.  Since it is not a binary categorization, one would expect lower performance.  However, even in this data/category set, we improved on the baseline significantly.

 

Most of the time, Multinomial performs worse than Unomial.  This result is surprising, since Multinomial is generally considered to give slightly better results than Unomial.  One reason that Unomial may have outperformed Multinomial is that our counts are not normalized for document length.  Because our data is skewed, the presence of a word is perhaps more meaningful than the number of occurrences of that word.  The multinomial method is sensitive to counts, while the Unomial method is sensitive only to the presence or absence of a feature.  (We would have liked to try normalizing counts, but we ran out of time).

 

As expected, Random feature selection performed approximately 10% below  the average of other methods.  It performed above baseline because it still used priors and utilized additional information from 1000 features, just not the most informative ones.

 

Average mutual information performed better than pointwise.  This result owes to the difficulties with pointwise mutual information ‘pointing’ to the women category.  Average Mutual Information helps ensure that there are features that point to each class in the set of classes.  Maximizing the average finds features that tend to be high in all categories rather than just one, so strong performance in the women’s category does not dominate as strongly.

 

5.3 Sampling of SingAuth Results with 2nd Level Feature Selection

Figure 3: Second Level Feature Selection Results

Figure 3 shows some of our results using the SingAuth data set and 2nd Level Pointwise Mutual Information feature selection.  From this figure, we can clearly see how the second-level Mutual-Information helped in one problematic case.  For the case of classifying the gender of English authors with the Unomial classifier, the second level technique improved the results from 48% to 81%.  There are two reasons.  First, because there are few English women, a single-level feature selection technique is more likely to select a feature that is used often in a single English/Woman work but which is not representative of the class as a whole.  By using our second level feature selection scheme, we reduce the chance of using such a feature.  Second, since the Unomial classifier considers only the presence of a word but not the number of occurrences, it amplifies the detrimental effect of the non-representative feature.  Multinomial compensates for the use of the non-representative feature by amplifying the effects of the good features.  It is noteworthy that the second level technique successfully removes a lot of the proper names from the English/Women category, including Marianne, Rachel, Derrick, Julius, Walter, Helen, Elizabeth, and Ellen.

 

5.4 Sampling of SingAuth Results with Both 2nd Level Feature Selection and Balancing

 

 

Figure 4: Second Level and Balancing Feature Selection Results

 

Figure 4 shows some of our results using the SingAuth data set, 2nd Level Pointwise Mutual Information feature selection, and balancing.  Balancing the number of "men" and "women" features greatly improved the performance of classifying English authors' according to gender using the Unomial classifier, but for a different reason. Without balancing, most of the features selected were female features, because the small number of English women writers tended to make the mutual information scores of the women features large. By balancing the features in our feature selection scheme, our program picked more male features than it would have without balancing.  The Unomial classifier was more sensitive to the "woman bias" problem, so its performance improved the most.

 

Appendix B lists many of our results in more detail, including results achieved using different combinations of the feature selection techniques.  In these results, we also forced the use of different proportions (other than 1/2, 1/2) of male and female features, hoping to find a good distribution.

5.5 Observations

To give a sampling of the types of features our feature selection methods chose, we list below some of the features that Pointwise Mutual Information chose for the American Gender tests.

 

List of Features Pointing to Women: she, her, "‘", ".", "“", t, I , you, Duane, s (possessive), Linda, Jo, it, misses, Ruth, little, had, has, have, was, when , ? , if, bud, David , an, Meg, with, Don, Amy, dear, sort, think, beauty, beautiful, lovely, wonderful, handsome, competition, music, dances, work, mind, chess, girlish, married, marriage, child, children, gun, friends, morbid, jealous, supercilious, savory, charming, bitter, pleasant, anger, forgive, tomorrow, yesterday, time, sometime, teatime, re, relationship, maybe, perhaps.

 

List of Features Pointing to Men: the, of, "-", ";", his, "--", in, de, we, "!", ",", by, und, he, <, >, man, upon, believe, kill, killed, skill, competitor, warriors, warrior, war, mystery, police, science, musical, art, dance, money, toil, god, rational, sex, us, spirit, spiritual, religion, theology, baseball, football, grave, unsavory, sagacious, rage, furious, organized, systematic, prayer, now, immediately, memory, times, sport , mayhap, verily, anon, peradventure, methinks, ocean, boat, ship, oar, mast, sail.

 

These lists reinforce many gender stereotypes.  There are topical distinctions.  Women write about topics that use words such as marriage, children, cooking, and kitchen.  Men write about topics that use words such as war, science, sport, and money.   Women use words that reflect on time passing while men are in the ‘here’ and ‘now’.  The presence of “he” and “his” in the men's list and “she” and “her” in the women’s list suggests authors write more about characters of their own gender. 

 

There are some trends that reflect less overt differences.  The presence of “m”, “d”, single quote, and “re” in the women’s list suggest they use contractions more often.  The solitary “s” is likely the frequent use of the possessive.  The presence of “.” may also indicates that women use shorter sentences, since a larger percentage of tokens are periods in the female sets.  This conclusion is reinforced in the men's list by the presence of many punctuation symbols which tend to elongate sentences, such as “;” and “,” (it might not be a bad idea to add average sentence length and average word length as features in future work).  There are more proper names in the women’s list, perhaps reflecting that women initially tended to use the novel format, or that women focus more on the main character of their novels by directly referring to them in the third person.  Women use the present participle often, whereas there is not a single present participle in the first 1000 features on the men’s list.

 

Our data set is certainly not perfect.  The list of Men’s features has more antiquated words (e.g. “anon” and “methinks”).  This is likely the outcome of a data set that is especially dominated by men in the earlier works.  The few women writers are from after the 18th century because women authors were considered more acceptable, and publishable, as time progressed.

6 Ideas For Future Research

There were several possible extensions that we did not have time to implement.  We considered varying the number of features used and the types of features used.  We could have eliminated all capitalized words to purge the feature list of proper names that do not reflect the class.  Different features might have been included, such as average word length, average sentence length, and grammatical sentence structures.

 

We could have normalized the counts according to document length.  Alternatively, we might have taken the first N words from every document or a random sampling of N words.  Another consideration concerning the data is the presence of different format genres.  For example, we might have improved performance if we eliminated poetry from our data set.

 

Our data set included older works.  It would be interesting to see if distinctions based on gender have blurred more recently.

7  Conclusion   

There are many further ideas for exploration in this area.  Essentially, in our project, we demonstrated that, for our data set, there are differences between male and female writing and between English and American writing.  Although our data set is not ideal, our results provide evidence that there are inherent differences in general.  However, our data set works with older literature.  Experiments with more recent literature might provide insight into how gender differences have changed over the years.


Appendix A: Data Set

 

A not so-well formatted list of our MultAuth data set follows.  The SingAuth data set simply chooses one book per author from the MultAuth data set.

 

American Literature

 

the second book of modern verse ed rittenhouse

the little book of modern verse ed rittenhouse

little women by louisa may alcott

flower fables by louisa may alcott

the story of a bad boy by thomas bailey aldrich

cast upon the breakers by horatio alger

frank s campaign/farm & camp horatio alger jr

the scouts of the valley by joseph a altsheler

fantastic fables by ambrose bierce

the secret garden by frances hodgson burnett

extracts from adam s diary by mark twain

life on the mississippi by mark twain

tom sawyer detective mark twain

a horse s tale by mark twain

man that corrupted hadleyburg by mark twain

the pathfinder by james fenimore cooper

life in the iron mills by rebecca harding davis

miss civilization by richard harding davis

vera the medium by richard harding davis

the reporter who made himself king by davis

culprit fay and other poems joseph rodman drake

the damnation of theron ware by harold frederic

the market place by harold frederic

copy cat & other stories by mary wilkins freeman

the yates pride by mary e wilkins freeman

the yellow wallpaper by charlotte perkins gilman

herland

the ways of men by eliot gregory

worldly ways and byways by eliot gregory

selected stories by bret harte

chita a memory of last island by lafcadio hearn

the altar of the dead by henry james

the figure in the carpet by henry james

an international episode by henry james

the lesson of the master by henry james

roderick hudson by henry james

the death of the lion by henry james

the country of the pointed firs sarah orne jewett

select poems of sidney lanier ed callaway

the breitmann ballads by charles g leland

blix by frank norris

moran of the lady letty by frank norris

the burial of the guns by thomas nelson page

the gentle grafter by o henry

heart of the west by o henry

roads of destiny by o henry

howard pyle s book of pirates

twilight land by howard pyle

initials only by anna katharine green

the woman in the alcove by anna katharine green

charlotte temple by susanna rowson

poems patriotic religious etc by father ryan

the lady or the tiger? by frank r stockton

rudder grange by frank r stockton

monsieur beaucaire by booth tarkington

penrod and sam by booth tarkington

the turmoil a novel by booth tarkington

beauty and the beast etc by bayard taylor

fisherman s luck by henry van dyke

the ruling passion by henry van dyke

ben hur a tale of the christ by lew wallace

the birds christmas carol kate douglas wiggin

a cathedral courtship by kate douglas wiggin

the diary of a goose girl by wiggin

new chronicles of rebecca by kate douglas wiggin

the old peabody pew by kate douglas wiggin

penelope s experiences in scotland by wiggin

penelope s postscripts by kate douglas wiggin

penelope s irish experiences by kate d wiggin

rose o the river by kate douglas wiggin

story of waitstill baxter by kate d wiggin

the village watch tower by kate douglas wiggin

the jimmyjohn boss and other stories by wister

lady baltimore by owen wister

lin mclean by owen wister

padre ignacio by owen wister

the outlet by andy adams

winesburg ohio by sherwood anderson

dorothy and the wizard in oz by l frank baum

the enchanted island of yew by l frank baum

the emerald city of oz l frank baum

glinda of oz by l frank baum

the lost princess of oz by baum

rinkitink in oz by l frank baum

the magic of oz by l frank baum

ozma of oz by l frank baum

the patchwork girl of oz by l frank baum

the road to oz by l frank baum

the scarecrow of oz by l frank baum

tik tok of oz by l frank baum

the tin woodman of oz by baum

the agony column by earl derr biggers

the land that time forgot by burroughs

the outlaw of torn by edgar rice burroughs

tarzan the untamed by edgar rice burroughs

out of time s abyss edgar rice burroughs

pigs is pigs by ellis parker butler

alexander s bridge by willa cather

my antonia by willa cather

song of the lark willa cather

cobb s anatomy by irvin s cobb

a plea for old cap collier by irvin s cobb

speaking of operations by irvin s cobb

the financier by theodore dreiser

lahoma by john breckinridge ellis

songs for parents by john farrar

emma mcchesney & co by edna ferber

betty zane by zane grey

the call of the canyon by zane grey

the last of the plainsmen by zane grey

the lone star ranger by zane grey

the spirit of the border by zane grey

to the last man by zane grey

wildfire by zane grey

the young forester by zane grey

a heap o livin by edgar a guest

just folks by edgar a guest

trees and other poems joyce kilmer

keziah coffin by joseph c lincoln

on the makaloa mat/island tales jack london

smoke bellew by jack london

men women and ghosts by amy lowell

sword blades and poppy seed by amy lowell

the haunted bookshop by christopher morley

where the blue begins by christopher morley

a mountain woman by elia w peattie

painted windows by elia w peattie

just david by eleanor h porter

freckles by gene stratton porter

her father s daughter by gene stratton porter

the vision splendid by william macleod raine

lavender and old lace by myrtle reed

the poisoned pen by arthur b reeve

the amazing interlude by mary roberts rinehart

bab a sub deb by mary roberts rinehart

dangerous days by mary roberts rinehart

the man in lower ten by mary roberts rinehart

a poor wise man by mary roberts rinehart

sight unseen by mary roberts rinehart

the street of seven stars by mary roberts rinehart

when a man marries by mary roberts rinehart

children of the night by edwin arlington robinson

the man against the sky by edwin a robinson

cabin fever by b m bower

cow country by b m bower

the flying u ranch by b m bower

the flying u s last stand by b m bower

the lure of the dim trails by b m bower

the trail of the white mule by b m bower

damaged goods by upton sinclair from les avaries

the darrow enigma by melvin l severy

love songs by sara teasdale

daddy long legs by jean webster

dear enemy by jean webster #2

the glimpses of the moon by edith wharton

house of mirth by edith wharton

the reef by edith wharton

the touchstone by edith wharton

arizona nights by stewart edward white

the riverman by stewart edward white

other things being equal by emma wolf

wild justice by ruth m sprague

erewhon revisited by samuel butler

mudfog and other sketches by charles dickens

mugby junction by charles dickens

liber amoris or the new pygmalion by wm hazlitt

de profundis by oscar wilde

gathering of brother hilarius by michael fairless

 

 

English Literature

 

adventures among books by andrew lang

flower of the mind by alice meynell

cavalier songs & ballads of england mackay/editor

ancient poems ballads and songs of england

old christmas by washington irving

the coxon fund by henry james

glasses by henry james

the jew of malta by christopher marlowe

venus and adonis by william shakespeare

sir thomas more shakespeare apocrypha

the two noble kinsmen shakespeare apocrypha

beautiful stories from shakespeare by e nesbit

the life of john bunyan by edmund venables

the double dealer by william congreve

love for love by william congreve

the old bachelor by william congreve

dickory cronke by daniel defoe

journal of a voyage to lisbon by henry fielding

she stoops to conquer by oliver goldsmith

the lucasta poems by richard lovelace

poetical works by john milton

school for scandal by richard brinsley sheridan

battle of the books et al by jonathan swift

a modest proposal by jonathan swift

the castle of otranto by horace walpole

love and friendship et al by jane austen

sense and sensibility by jane austen

peter pan in kensington gardens by j m barrie

the provost by john galt

derrick vaughan novelist by edna lyall

we two by edna lyall

many voices by e nesbit

the romany rye by george borrow

letters of george borrow

life of robert browning by william sharp

poems and songs of robert burns

sartor resartus by thomas carlyle

the ballad of the white horse by gk chesterton

a miscellany of men by g k chesterton

the frozen deep by wilkie collins

the haunted hotel by wilkie collins

the law and the lady by wilkie collins

miss or mrs? by wilkie collins

the little lame prince by miss mulock

the puzzle of dickens s last plot by andrew lang

the battle of life by charles dickens

the cricket on the hearth by charles dickens

doctor marigold by charles dickens

the holly tree by charles dickens

hunted down by charles dickens

the lamplighter by charles dickens

lazy tour of two idle apprentices by dickens

master humphrey s clock by charles dickens

perils of certain english prisoners by dickens

reprinted pieces by charles dickens

the seven poor travellers by charles dickens

somebody s luggage by charles dickens

speeches literary & social by charles dickens

sunday under three heads by charles dickens

tom tiddler s ground by charles dickens

to be read at dusk by charles dickens

the uncommercial traveller by charles dickens

wreck of the golden mary by charles dickens

phantasmagoria and other poems by lewis carroll

tales of terror & mystery arthur conan doyle

the lost world/arthur conan doyle

memoirs of sherlock holmes arthur conan doyle

the sign of the four by arthur conan doyle

the white company by arthur conan doyle

the absentee by maria edgeworth

the daughter of an empress by louise muhlbach

murad the unlucky etc by maria edgeworth

the annals of the parish john galt

the ayrshire legatees by john galt

more bab ballads by w s gilbert

songs of a savoyard by w s gilbert

diary of a nobody by george and weedon grossmith

allan quatermain by h rider haggard

child of storm by h rider haggard

finished by h rider haggard

montezuma s daughter by h rider haggard

nada the lily by h rider haggard

return of the native by thomas hardy

a pair of blue eyes by thomas hardy

the woodlanders by thomas hardy

dolly dialogues by anthony hope

tom brown s school days by thomas hughes ]

idle thoughts of an idle fellow jerome k jerome

diary of a pilgrimage by jerome k jerome

novel notes by jerome k jerome

paul kelver by jerome k jerome

second thoughts of an idle fellow by jerome

three men in a boat by jerome k jerome

told after supper by jerome k jerome

the water babies by charles kingsley

verses 1889 1896 by rudyard kipling

rewards and fairies by rudyard kipling

the second jungle book by rudyard kipling

ban and arriere ban by andrew lang

grass of parnassus by andrew lang

a monk of fife by andrew lang

new collected rhymes by andrew lang

travels in england by paul hentzner

rhymes a la mode by andrew lang

last days of pompeii edward george bulwer lytton

rienzi last of the roman tribunes by e b lytton

lays of ancient rome by thomas babbington macaulay

the princess and curdie by george macdonald

the princess and the goblin by george macdonald

masterman ready by captain marryat

a reading of life other poems by george meredith

the colour of life by alice meynell

later poems by alice meynell

penelope s english experiences by kate d wiggin

the spirit of place et al by alice meynell

child christopher by william morris

the grey room by eden phillpotts

the king of the golden river by john ruskin

sesame and lilies by john ruskin

bride of lammermoor by sir walter scott

chronicles of the canongate by walter scott

kenilworth by walter scott

misalliance by george bernard shaw

an unsocial socialist by george bernard shaw

life of john sterling by thomas carlyle

the black arrow by robert louis stevenson

catriona (kidnapped2) by robt l stevenson

the ebb tide by r l stevenson and l osbourne

island nights entertainments by stevenson

master of ballantrae by robert louis stevenson #38

merry men by robert louis stevenson

new arabian nights by robert louis stevenson

weir of hermiston by r l stevenson

st ives by robert louis stevenson

the wrecker by stevenson and osbourne

the wrong box by stevenson & osbourne

ediurgh picturesque notes by stevenson

the art of writing robert louis stevenson

essays of travel by robert louis stevenson

familiar studies of men & books by stevenson

memories and portraits by r l stevenson

travels with a donkey in the cevennes rls

virginibus puerisque by robert l stevenson

moral emblems by robert louis stevenson

a child s garden of verses/robert louis stevenson

underwoods by robert louis stevenson

vailima letters by robert l stevenson

deirdre of the sorrows by j m synge

riders to the sea j m synge

the tinker s wedding by j m synge

the well of the saints by j m synge

idylls of the king by alfred lord tennyson

catherine a story by william thackeray

the great hoggarty diamond by thackeray

men s wives by william makepeace thackeray

the rose and the ring by thackeray

sister songs by francis thompson

the city of dreadful night by james thomson

the prime minister by anthony trollope

the warden by anthony trollope

the door in the wall et al by h g wells

the first men in the moon by h g wells

the research magnificent by h g wells

the new machiavelli by h g wells

secret places of the heart by h g wells

soul of a bishop by h g wells

tono bungay by h g wells

twelve stories and a dream by h g wells

wheels of chance/bicycling idyll by h g wells

the world set free by h g wells

the memoirs of a minister of france by stanley weyman

a gentleman of france by stanley weyman

the house of the wolf by stanley weyman

selected poems of oscar wilde

selected prose of oscar wilde

charmides and other poems by oscar wilde

the duchess of padua by oscar wilde

a house of pomegranates by oscar wilde

an ideal husband by oscar wilde

intentions by oscar wilde

lady windermere s fan by oscar wilde

lord arthur savile s crime etc by oscar wilde

essays and lectures by oscar wilde

a woman of no importance by oscar wilde

the roadmender by margt

the ninth vibration et al by adams beck

seven men by max beerbohm

the works of max beerbohm by max beerbohm

greenmantle by john buchan

the path of the king by john buchan

prester john by john buchan

the thirty nine steps by john buchan

the cruise of the cachalot by frank t bullen

song book of quong lee of limehouse thomas burke

messer marco polo by brian oswald donn byrne

secret adversary by agatha christie

almayer s folly by joseph conrad

falk by joseph conrad

notes on life and letters by joseph conrad

within the tides by joseph conrad

some reminiscences by joseph conrad

amy foster by joseph conrad

my lady caprice by jeffrey farnol

country sentiment by robert graves

dead men tell no tales by e w hornung

raffles further adventures by e w hornung

a thief in the night by e w hornung

crome yellow by aldous huxley

a voyage to arcturus by david lindsay

over the sliprails by henry lawson

the lodger by marie belloc lowndes

in flanders fields by john mccrae

the great god pan by arthur machen

moon and sixpence by somerset maugham

the brother of daphne by dornford yates

the red house mystery by a a milne

martin hyde the duke s messenger by john masefield

the uearable bassington h h munro (saki)

the illustrious prince by e phillips oppenheim

kingdom of the blind by e phillips oppenheim

peter ruff and the double four by oppenheim

the vanished messenger by e phillips oppenheim

the voice of the city by o henry

whirligigs by o henry

the yellow crayon by e phillips oppenheim

the zeppelin s passenger by e phillips oppenheim

saltbush bill j p by a b banjo paterson

elizabeth and her german garden by elizabeth

captain blood by rafael sabatini

mistress wilding by rafael sabatini

scaramouche by rafael sabatini

ballads of a bohemian by robert w service

rhymes of a red cross man by robert w service

rhymes of a rolling stone by robert w service

kai lung s golden hours by ernest bramah

the mirror of kong ho by ernest bramah

the wallet of kai lung by ernest bramah

the crock of gold by james stephens

lair of the white worm by bram stoker

fire tongue by sax rohmer

the insidious dr fu manchu by sax rohmer

the yellow claw by sax rohmer

under the red robe by stanley weyman

piccadilly jim by pelham grenville wodehouse

something new by p g wodehouse

the voyage out by virginia woolf

gambara by honore de balzac

the pool in the desert sara jeannette duncan

the white moll by frank l packard

dream life and real life by olive schreiner

the story of an african farm by olive schreiner

an anthology of australian verse bertram stevens


Appendix B: Results Listing

Results for Multiple Authors Dataset

 

Data Set

Cat Set

Training Correct %

TestCorrect %

All

GenNat

53.494

53.494

All

Gender

47.494

48.21

All

Nationality

63.6145

61.2048

American

Gender

82.7586

74.1379

Boy

Nationality

47.7612

46.5672

English

Gender

25.7261

27.3859

Girl

Nationality

85

82.5

 

Results for Single Authors Dataset

 

Data Set

Cat Set

Feature Extraction

Classifier

Training Correct %

TestCorrect %

All

Gen

AverageMutualInformation

Multinomial

94

81

All

Gen

AverageMutualInformation

Unomial

91

84

All

Gen

AverageMutualInformation

Bournoulli

91

84

All

Gen

ChiSquared

Multinomial

94.5

83.5

All

Gen

ChiSquared

Unomial

70.5

67.5

All

Gen

ChiSquared

Bournoulli

70.5

67.5

All

Gen

PointwiseMutualInformation

Multinomial

92.5

81

All

Gen

PointwiseMutualInformation

Unomial

69.5

66.5

All

Gen

PointwiseMutualInformation

Bournoulli

69.5

66.5

All

Gen

Random

Multinomial

84

63

All

Gen

Random

Unomial

92.5

77.5

All

Gen

Random

Bournoulli

92.5

77.5

All

GenNat

AverageMutualInformation

Multinomial

85.5

61.5

All

GenNat

AverageMutualInformation

Unomial

83.5

67

All

GenNat

AverageMutualInformation

Bournoulli

83.5

67

All

GenNat

ChiSquared

Multinomial

86.5

63.5

All

GenNat

ChiSquared

Unomial

84

69

All

GenNat

ChiSquared

Bournoulli

84

69

All

GenNat

PointwiseMutualInformation

Multinomial

85

60

All

GenNat

PointwiseMutualInformation

Unomial

78

66

All

GenNat

PointwiseMutualInformation

Bournoulli

78

66

All

GenNat

Random

Multinomial

77

44

All

GenNat

Random

Unomial

87.5

54.5

All

GenNat

Random

Bournoulli

88

54.5

All

Nat

AverageMutualInformation

Multinomial

88

73.5

All

Nat

AverageMutualInformation

Unomial

88.5

85.5

All

Nat

AverageMutualInformation

Bournoulli

88.5

85.5

All

Nat

ChiSquared

Multinomial

87.5

73.5

All

Nat

ChiSquared

Unomial

89

85.5

All

Nat

ChiSquared

Bournoulli

89

85.5

All

Nat

PointwiseMutualInformation

Multinomial

84.5

73

All

Nat

PointwiseMutualInformation

Unomial

81

77

All

Nat

PointwiseMutualInformation

Bournoulli

81

77

All

Nat

Random

Multinomial

86

75

All

Nat

Random

Unomial

90.5

76.5

All

Nat

Random

Bournoulli

90.5

76.5

Amer

Gen

AverageMutualInformation

Multinomial

93.3333

82.2222

Amer

Gen

AverageMutualInformation

Unomial

95.5556

86.6667

Amer

Gen

AverageMutualInformation

Bournoulli

95.5556

86.6667

Amer

Gen

ChiSquared

Multinomial

93.3333

82.2222

Amer

Gen

ChiSquared

Unomial

94.4444

84.4444

Amer

Gen

ChiSquared

Bournoulli

94.4444

84.4444

Amer

Gen

PointwiseMutualInformation

Multinomial

93.3333

80

Amer

Gen

PointwiseMutualInformation

Unomial

88.8889

80

Amer

Gen

PointwiseMutualInformation

Bournoulli

88.8889

80

Amer

Gen

Random

Multinomial

85.5556

65.5556

Amer

Gen

Random

Unomial

96.6667

78.8889

Amer

Gen

Random

Bournoulli

96.6667

78.8889

Boy

Nat

AverageMutualInformation

Multinomial

89.8734

75.9494

Boy

Nat

AverageMutualInformation

Unomial

86.7089

81.6456

Boy

Nat

AverageMutualInformation

Bournoulli

86.7089

81.6456

Boy

Nat

ChiSquared

Multinomial

89.8734

75.9494

Boy

Nat

ChiSquared

Unomial

82.2785

74.6835

Boy

Nat

ChiSquared

Bournoulli

82.2785

74.6835

Boy

Nat

PointwiseMutualInformation

Multinomial

89.2405

75.9494

Boy

Nat

PointwiseMutualInformation

Unomial

74.0506

70.8861

Boy

Nat

PointwiseMutualInformation

Bournoulli

74.0506

70.8861

Boy

Nat

Random

Multinomial

87.9747

71.519

Boy

Nat

Random

Unomial

93.038

76.5823

Boy

Nat

Random

Bournoulli

93.038

76.5823

Eng

Gen

AverageMutualInformation

Multinomial

97.2727

81.8182

Eng

Gen

AverageMutualInformation

Unomial

92.7273

88.1818

Eng

Gen

AverageMutualInformation

Bournoulli

92.7273

88.1818

Eng

Gen

ChiSquared

Multinomial

97.2727

88.1818

Eng

Gen

ChiSquared

Unomial

47.2727

45.4545

Eng

Gen

ChiSquared

Bournoulli

47.2727

45.4545

Eng

Gen

PointwiseMutualInformation

Multinomial

96.3636

86.3636

Eng

Gen

PointwiseMutualInformation

Unomial

47.2727

47.2727

Eng

Gen

PointwiseMutualInformation

Bournoulli

47.2727

47.2727

Eng

Gen

Random

Multinomial

96.3636

72.7273

Eng

Gen

Random

Unomial

93.6364

73.6364

Eng

Gen

Random

Bournoulli

93.6364

73.6364

Girl

Nat

AverageMutualInformation

Multinomial

97.619

85.7143

Girl

Nat

AverageMutualInformation

Unomial

97.619

92.8571

Girl

Nat

AverageMutualInformation

Bournoulli

97.619

92.8571

Girl

Nat

ChiSquared

Multinomial

97.619

85.7143

Girl

Nat

ChiSquared

Unomial

90.4762

85.7143

Girl

Nat

ChiSquared

Bournoulli

90.4762

85.7143

Girl

Nat

PointwiseMutualInformation

Multinomial

95.2381

83.3333

Girl

Nat

PointwiseMutualInformation

Unomial

92.8571

90.4762

Girl

Nat

PointwiseMutualInformation

Bournoulli

92.8571

90.4762

Girl

Nat

Random

Multinomial

97.619

78.5714

Girl

Nat

Random

Unomial

97.619

83.3333

Girl

Nat

Random

Bournoulli

97.619

83.3333

 

 

Results for Single Authors Dataset comparing the second level and balancing effects

 

Data Set

Cat Set

Feature Extraction

Classifier

Remove

Balancing

Traning % Correct

Testing % Correct

Amer

Gen

PointwiseMutualInformation

Multinomial

0

balance

92.2222

80

Amer

Gen

PointwiseMutualInformation

Multinomial

100

balance

91.1111

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

100

nbalance

91.1111

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

150

balance

88.8889

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

150

nbalance

90

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

200

balance

90

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

200

nbalance

90

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

20

balance

92.2222

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

20

nbalance

91.1111

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

250

balance

90

84.4444

Amer

Gen

PointwiseMutualInformation

Multinomial

250

m200

90

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

250

m800

87.7778

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

250

nbalance

91.1111

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

300

balance

83.3333

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

300

m200

86.6667

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

300

m800

86.6667

77.7778

Amer

Gen

PointwiseMutualInformation

Multinomial

300

nbalance

83.3333

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

400

balance

81.1111

77.7778

Amer

Gen

PointwiseMutualInformation

Multinomial

400

m200

88.8889

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

400

m800

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Multinomial

400

nbalance

82.2222

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

500

balance

83.3333

77.7778

Amer

Gen

PointwiseMutualInformation

Multinomial

500

m200

90

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

500

m800

81.1111

78.8889

Amer

Gen

PointwiseMutualInformation

Multinomial

500

nbalance

88.8889

77.7778

Amer

Gen

PointwiseMutualInformation

Multinomial

50

balance

90

80

Amer

Gen

PointwiseMutualInformation

Multinomial

50

nbalance

91.1111

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

600

balance

87.7778

80

Amer

Gen

PointwiseMutualInformation

Multinomial

600

m200

86.6667

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

600

m800

84.4444

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

600

nbalance

87.7778

82.2222

Amer

Gen

PointwiseMutualInformation

Multinomial

700

balance

87.7778

80

Amer

Gen

PointwiseMutualInformation

Multinomial

700

m200

90

85.5556

Amer

Gen

PointwiseMutualInformation

Multinomial

700

m800

84.4444

80

Amer

Gen

PointwiseMutualInformation

Multinomial

700

nbalance

87.7778

80

Amer

Gen

PointwiseMutualInformation

Multinomial

800

balance

86.6667

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

800

m200

88.8889

85.5556

Amer

Gen

PointwiseMutualInformation

Multinomial

800

m800

83.3333

80

Amer

Gen

PointwiseMutualInformation

Multinomial

800

nbalance

87.7778

81.1111

Amer

Gen

PointwiseMutualInformation

Multinomial

900

balance

85.5556

80

Amer

Gen

PointwiseMutualInformation

Multinomial

900

m200

87.7778

84.4444

Amer

Gen

PointwiseMutualInformation

Multinomial

900

m800

83.3333

77.7778

Amer

Gen

PointwiseMutualInformation

Multinomial

900

nbalance

84.4444

80

Amer

Gen

PointwiseMutualInformation

Unomial

0

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Unomial

100

balance

86.6667

76.6667

Amer

Gen

PointwiseMutualInformation

Unomial

100

nbalance

91.1111

84.4444

Amer

Gen

PointwiseMutualInformation

Unomial

150

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Unomial

150

nbalance

92.2222

84.4444

Amer

Gen

PointwiseMutualInformation

Unomial

200

balance

85.5556

78.8889

Amer

Gen

PointwiseMutualInformation

Unomial

200

nbalance

90

86.6667

Amer

Gen

PointwiseMutualInformation

Unomial

20

balance

86.6667

74.4444

Amer

Gen

PointwiseMutualInformation

Unomial

20

nbalance

88.8889

83.3333

Amer

Gen

PointwiseMutualInformation

Unomial

250

balance

83.3333

77.7778

Amer

Gen

PointwiseMutualInformation

Unomial

250

m200

35.5556

35.5556

Amer

Gen

PointwiseMutualInformation

Unomial

250

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

250

nbalance

88.8889

87.7778

Amer

Gen

PointwiseMutualInformation

Unomial

300

balance

84.4444

78.8889

Amer

Gen

PointwiseMutualInformation

Unomial

300

m200

35.5556

35.5556

Amer

Gen

PointwiseMutualInformation

Unomial

300

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

300

nbalance

87.7778

85.5556

Amer

Gen

PointwiseMutualInformation

Unomial

400

balance

83.3333

80

Amer

Gen

PointwiseMutualInformation

Unomial

400

m200

42.2222

41.1111

Amer

Gen

PointwiseMutualInformation

Unomial

400

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

400

nbalance

86.6667

82.2222

Amer

Gen

PointwiseMutualInformation

Unomial

500

balance

82.2222

78.8889

Amer

Gen

PointwiseMutualInformation

Unomial

500

m200

41.1111

40

Amer

Gen

PointwiseMutualInformation

Unomial

500

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

500

nbalance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Unomial

50

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Unomial

50

nbalance

88.8889

83.3333

Amer

Gen

PointwiseMutualInformation

Unomial

600

balance

86.6667

84.4444

Amer

Gen

PointwiseMutualInformation

Unomial

600

m200

40

42.2222

Amer

Gen

PointwiseMutualInformation

Unomial

600

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

600

nbalance

90

86.6667

Amer

Gen

PointwiseMutualInformation

Unomial

700

balance

86.6667

84.4444

Amer

Gen

PointwiseMutualInformation

Unomial

700

m200

42.2222

41.1111

Amer

Gen

PointwiseMutualInformation

Unomial

700

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

700

nbalance

86.6667

85.5556

Amer

Gen

PointwiseMutualInformation

Unomial

800

balance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Unomial

800

m200

43.3333

42.2222

Amer

Gen

PointwiseMutualInformation

Unomial

800

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

800

nbalance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Unomial

900

balance

90

87.7778

Amer

Gen

PointwiseMutualInformation

Unomial

900

m200

43.3333

41.1111

Amer

Gen

PointwiseMutualInformation

Unomial

900

m800

73.3333

71.1111

Amer

Gen

PointwiseMutualInformation

Unomial

900

nbalance

92.2222

87.7778

Amer

Gen

PointwiseMutualInformation

Bernoulli

0

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

100

balance

86.6667

76.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

100

nbalance

91.1111

84.4444

Amer

Gen

PointwiseMutualInformation

Bernoulli

150

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

150

nbalance

92.2222

84.4444

Amer

Gen

PointwiseMutualInformation

Bernoulli

200

balance

85.5556

78.8889

Amer

Gen

PointwiseMutualInformation

Bernoulli

200

nbalance

90

86.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

20

balance

86.6667

74.4444

Amer

Gen

PointwiseMutualInformation

Bernoulli

20

nbalance

88.8889

83.3333

Amer

Gen

PointwiseMutualInformation

Bernoulli

250

balance

83.3333

77.7778

Amer

Gen

PointwiseMutualInformation

Bernoulli

250

m200

35.5556

35.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

250

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

250

nbalance

88.8889

87.7778

Amer

Gen

PointwiseMutualInformation

Bernoulli

300

balance

84.4444

78.8889

Amer

Gen

PointwiseMutualInformation

Bernoulli

300

m200

35.5556

35.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

300

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

300

nbalance

87.7778

85.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

400

balance

83.3333

80

Amer

Gen

PointwiseMutualInformation

Bernoulli

400

m200

42.2222

41.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

400

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

400

nbalance

86.6667

82.2222

Amer

Gen

PointwiseMutualInformation

Bernoulli

500

balance

82.2222

77.7778

Amer

Gen

PointwiseMutualInformation

Bernoulli

500

m200

41.1111

40

Amer

Gen

PointwiseMutualInformation

Bernoulli

500

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

500

nbalance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

50

balance

84.4444

76.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

50

nbalance

88.8889

83.3333

Amer

Gen

PointwiseMutualInformation

Bernoulli

600

balance

86.6667

84.4444

Amer

Gen

PointwiseMutualInformation

Bernoulli

600

m200

40

42.2222

Amer

Gen

PointwiseMutualInformation

Bernoulli

600

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

600

nbalance

90

86.6667

Amer

Gen

PointwiseMutualInformation

Bernoulli

700

balance

86.6667

84.4444

Amer

Gen

PointwiseMutualInformation

Bernoulli

700

m200

42.2222

41.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

700

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

700

nbalance

86.6667

85.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

800

balance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

800

m200

43.3333

42.2222

Amer

Gen

PointwiseMutualInformation

Bernoulli

800

m800

71.1111

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

800

nbalance

90

85.5556

Amer

Gen

PointwiseMutualInformation

Bernoulli

900

balance

90

87.7778

Amer

Gen

PointwiseMutualInformation

Bernoulli

900

m200

43.3333

41.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

900

m800

73.3333

71.1111

Amer

Gen

PointwiseMutualInformation

Bernoulli

900

nbalance

92.2222

87.7778

F_Eng

Gen

PointwiseMutualInformation

Multinomial

0

balance

97.2727

84.5455

F_Eng

Gen

PointwiseMutualInformation

Multinomial

100

balance

95.4545

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

100

nbalance

96.3636

90.9091

F_Eng

Gen

PointwiseMutualInformation

Multinomial

150

balance

92.7273

88.1818

F_Eng

Gen

PointwiseMutualInformation

Multinomial

150

nbalance

91.8182

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

200

balance

92.7273

90.9091

F_Eng

Gen

PointwiseMutualInformation

Multinomial

200

nbalance

92.7273

91.8182

F_Eng

Gen

PointwiseMutualInformation

Multinomial

20

balance

96.3636

83.6364

F_Eng

Gen

PointwiseMutualInformation

Multinomial

20

nbalance

96.3636

87.2727

F_Eng

Gen

PointwiseMutualInformation

Multinomial

250

balance

92.7273

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

250

m200

90.9091

88.1818

F_Eng

Gen

PointwiseMutualInformation

Multinomial

250

m800

90

82.7273

F_Eng

Gen

PointwiseMutualInformation

Multinomial

250

nbalance

90.9091

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

300

balance

92.7273

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

300

m200

91.8182

88.1818

F_Eng

Gen

PointwiseMutualInformation

Multinomial

300

m800

91.8182

81.8182

F_Eng

Gen

PointwiseMutualInformation

Multinomial

300

nbalance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

400

balance

90.9091

90.9091

F_Eng

Gen

PointwiseMutualInformation

Multinomial

400

m200

90.9091

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

400

m800

91.8182

87.2727

F_Eng

Gen

PointwiseMutualInformation

Multinomial

400

nbalance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

500

balance

90.9091

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

500

m200

90

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

500

m800

89.0909

88.1818

F_Eng

Gen

PointwiseMutualInformation

Multinomial

500

nbalance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

50

balance

97.2727

84.5455

F_Eng

Gen

PointwiseMutualInformation

Multinomial

50

nbalance

97.2727

88.1818

F_Eng

Gen

PointwiseMutualInformation

Multinomial

600

balance

90

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

600

m200

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

600

m800

89.0909

87.2727

F_Eng

Gen

PointwiseMutualInformation

Multinomial

600

nbalance

90.9091

90.9091

F_Eng

Gen

PointwiseMutualInformation

Multinomial

700

balance

89.0909

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

700

m200

89.0909

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

700

m800

88.1818

86.3636

F_Eng

Gen

PointwiseMutualInformation

Multinomial

700

nbalance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

800

balance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

800

m200

89.0909

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

800

m800

88.1818

86.3636

F_Eng

Gen

PointwiseMutualInformation

Multinomial

800

nbalance

90

90

F_Eng

Gen

PointwiseMutualInformation

Multinomial

900

balance

90

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

900

m200

89.0909

89.0909

F_Eng

Gen

PointwiseMutualInformation

Multinomial

900

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Multinomial

900

nbalance

90

89.0909

F_Eng

Gen

PointwiseMutualInformation

Unomial

0

balance

91.8182

86.3636

F_Eng

Gen

PointwiseMutualInformation

Unomial

100

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

100

nbalance

60

55.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

150

balance

90.9091

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

150

nbalance

61.8182

60.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

200

balance

90

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

200

nbalance

70

65.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

20

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

20

nbalance

48.1818

48.1818

F_Eng

Gen

PointwiseMutualInformation

Unomial

250

balance

90.9091

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

250

m200

28.1818

30

F_Eng

Gen

PointwiseMutualInformation

Unomial

250

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

250

nbalance

66.3636

66.3636

F_Eng

Gen

PointwiseMutualInformation

Unomial

300

balance

89.0909

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

300

m200

30

30.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

300

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

300

nbalance

68.1818

70.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

400

balance

90

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

400

m200

30

30.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

400

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

400

nbalance

76.3636

77.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

500

balance

92.7273

86.3636

F_Eng

Gen

PointwiseMutualInformation

Unomial

500

m200

29.0909

30.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

500

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

500

nbalance

81.8182

80.9091

F_Eng

Gen

PointwiseMutualInformation

Unomial

50

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

50

nbalance

51.8182

51.8182

F_Eng

Gen

PointwiseMutualInformation

Unomial

600

balance

92.7273

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

600

m200

30

31.8182

F_Eng

Gen

PointwiseMutualInformation

Unomial

600

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

600

nbalance

83.6364

84.5455

F_Eng

Gen

PointwiseMutualInformation

Unomial

700

balance

90.9091

90

F_Eng

Gen

PointwiseMutualInformation

Unomial

700

m200

30

31.8182

F_Eng

Gen

PointwiseMutualInformation

Unomial

700

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

700

nbalance

84.5455

84.5455

F_Eng

Gen

PointwiseMutualInformation

Unomial

800

balance

91.8182

90

F_Eng

Gen

PointwiseMutualInformation

Unomial

800

m200

30.9091

31.8182

F_Eng

Gen

PointwiseMutualInformation

Unomial

800

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

800

nbalance

84.5455

85.4545

F_Eng

Gen

PointwiseMutualInformation

Unomial

900

balance

90.9091

88.1818

F_Eng

Gen

PointwiseMutualInformation

Unomial

900

m200

30.9091

31.8182

F_Eng

Gen

PointwiseMutualInformation

Unomial

900

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Unomial

900

nbalance

84.5455

84.5455

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

0

balance

91.8182

86.3636

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

100

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

100

nbalance

60

55.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

150

balance

90.9091

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

150

nbalance

61.8182

60.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

200

balance

90

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

200

nbalance

70

65.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

20

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

20

nbalance

48.1818

48.1818

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

250

balance

90.9091

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

250

m200

28.1818

30

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

250

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

250

nbalance

66.3636

66.3636

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

300

balance

89.0909

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

300

m200

30

30.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

300

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

300

nbalance

68.1818

70.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

400

balance

90

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

400

m200

30

30.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

400

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

400

nbalance

76.3636

77.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

500

balance

92.7273

86.3636

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

500

m200

29.0909

30.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

500

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

500

nbalance

81.8182

80.9091

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

50

balance

91.8182

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

50

nbalance

51.8182

51.8182

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

600

balance

92.7273

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

600

m200

30

31.8182

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

600

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

600

nbalance

83.6364

84.5455

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

700

balance

90.9091

90

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

700

m200

30

31.8182

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

700

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

700

nbalance

84.5455

84.5455

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

800

balance

91.8182

90

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

800

m200

30.9091

31.8182

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

800

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

800

nbalance

84.5455

85.4545

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

900

balance

90.9091

88.1818

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

900

m200

30.9091

31.8182

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

900

m800

87.2727

87.2727

F_Eng

Gen

PointwiseMutualInformation

Bernoulli

900

nbalance

84.5455

84.5455