|
|
Many people see the hierarchical agglomerative methods as a "hack,"
but we have shown that they are in fact model-based methods after all.
Constrained
Clustering [Demo]
Sometimes you have an idea what kinds of
clusters you want to get out of a clustering system. For example you
may know that certain points belong together while others don't. See
Kiri
Wagstaff's constrained clustering page for other work in this area.
Let's say we have two points that the clustering proceedure put into different clusters, but we know they actually belong together. It's possible that the problem is that one or both points are outliers -- that we don't want them to behave like the points nearby. However, it's more likely that the points near them are wrong, too. To fix multiple points at once, we can interpret the constraint as a space-level constraint, and use it to warp the space so those two points become closer together. Check out the demo!
Doing this gives much better performance that simply using instance
level constraints. The graphs below show accuracy as number of
constraints increases, comparing instance-level constraints (with
K-means) to space-level constraints (with complete-link clustering).
The space-level constraints give better final accuracy, despite being
used with an inferior clustering algorithm.
Most clustering algorithms work best on separated spherical data. Our
clustering method takes non-spherical data and uses constraints to
contract the similarity space in a way that makes the clusters tighter
and more spherical.
Publications
Comments about the project page? Feel free to email Dan or Sep.
|
Local links: NLP lunch · PAIL lunch · NLP Reading Group · JavaNLP (javadocs) · ScalaNLP · machines · Wiki |
Site design by Bill MacCartney |