
Title
Recent Advances in Clustering with Applications to Image Annotation
Speaker
Jia Li, Penn State University
Abstract
Recent advances in clustering from two directions will be presented. First, a new clustering approach based on mode identification and kernel density estimate will be introduced. A recently developed optimization algorithm, namely, the Modal EM (MEM), finds an ascending path from an arbitrary point to a local maximum (mode) of a density in the form of mixture distributions. A cluster is formed by those sample points that ascend to the same mode of the density function. In mode-based clustering, the role of mixture modeling is concentrated on density estimation (rather than capturing clusters in the mean time), and hence the result is more robust when clusters deviate substantially from Gaussian distributions. An algorithm, namely, Ridgeline EM (REM), is also developed to efficiently solve the ridgeline between the density bumps of two clusters. Theoretical properties of the ridgeline make it valuable for diagnosing clustering results and quantifying the separability between clusters.
In the second part of the talk, we consider clustering objects represented by sets of weighted vectors in contrast to vectors. Weighted vector sets are formulated as discrete distributions with finite but arbitrary support. A new clustering algorithm, namely D2-clustering (D2 stands for discrete distribution), is developed using linear programming to minimize the sum of Mallows distances between sample points and their corresponding cluster centroids. Combined with a generalized mixture modeling method based on the concept of hypothetical local mapping, D2-clustering is applied to real-time image annotation and is the core of ALIPR, an online automatic image tagging system.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.