Tag Clustering

To view this content, you need to install Java from java.com

Tag relationship visualization in preparation for an experimental navigation system. Tag similarity is visualized as spatial distance and color similarity, and tag frequency is related to font size.

Hit any key to reload. Select a beginning and end tag (which become highlighted red and green, respectively) to see the shortest path between them. Deselect by clicking again, or reselect the end by clicking a new tag.

First, tags are loaded for 18 of my projects, comprised of 104 tags. Each new tag is added as a vertex to a graph, making 30 unique tags, and two tags occuring in the same project are added as a (directional) edge, producing 592 edges. If a vertex or edge is already present, its count is increased. An edge from a to b has length (1 - (b.count / a.count)) — i.e., the negated probability that a implies b. If a always implies b, the edge length is 0.

Edge lengths are used to initialize a force directed graph built on Traer's physics and animation libraries.

Distance from each tag to the three most significant and polarized tags determines RGB color. These three tags are found by taking the mode of multiple K-medoids runs (like K-means, but each centroid is a data point) for K = 3 using Dijkstra's algorithm on the graph as the distance metric.


Source code: TagClustering Edge ForceDirectedGraph Graph Vertex

Built with Processing