Ohio State is in the process of revising websites and program materials to accurately reflect compliance with the law. While this work occurs, language referencing protected class status or other activities prohibited by Ohio Senate Bill 1 may still appear in some places. However, all programs and activities are being administered in compliance with federal and state law.

Seminar: Tim Liao

Statistics Seminar
October 23, 2008
All Day
209 W. Eighteenth Ave. (EA), Room 170

Title

A Rank-Based Clustering Method for the Analysis of Social Inequality Data

Speaker

Tim Liao, University of Illinois

Abstract

When studying social, economic or health inequality, the analyst must estimate clusters or classes contained in the data. The commonly used methods such as latent class/cluster models or the k-mean method assume the multivariate normal distribution. Most inequality data, however, are non-normal in distribution. This paper proposes a rank- based cluster analysis, which can take the form of a latent class/ finite mixture model or a basic cluster method such the k-means algorithm; in either case, the multivariate normal distributional assumption is no longer crucial. There are two theoretical foundations for the proposed method—relative deprivation theory in sociology and relative income concept in economics on the one hand, and topological distance in mathematical thinking on the other. This method offers an alternative view on inequality, and is nonparametric in essence. A simulation analysis of three-clusters mixtures indicated by two or three variables using three different data- generating mechanisms shows that when data are normal, either the (real) value-based or rank-based method would produce similar results. When data depart from normality, the results are more mixed: finite mixture models do somewhat better for data of real values while the k-means method performs much better for ranked data. Three empirical data applications further demonstrate the usefulness of the rank-based method: an analysis of the 1991 British Household Panel Survey data with three variables for socioeconomic classification, a re-analysis of the classic diabetes data, and an exploration of fertility inequality using the 2006 U.S. General Social Survey data. All three examples suggest some new substantive insights unobtainable from the parametric analysis of the original data and require much reduced computation time for estimation. 

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.