
Title
Introduction to Statistical Learning With Application to Drug Discovery
Speaker
Joseph Verducci, The Ohio State University
Abstract
Statistical Learning comes in two forms: supervised and unsupervised. The supervised form starts with a training set that contains both explanatory variables and a response variable; the goal is to learn enough about the association to predict the response of a new "test case" of the explanatory variables. Perhaps the simplest example of this is linear regression, where the form of the association and distribution of the response are specified, and the only thing to be learned is the set of coefficients relating response to the explanatory variables. Dozens of more general techniques have been developed, and a general theory has emerged, based on a principle that the complexity of the explanation should be limited by the amount of training data.
The unsupervised form is best known to statisticians through the method of clustering. Here all variables are explanatory and the goal is to infer natural groupings. An example is to learn about "functional" groupings of genes from their patterns of expression in a target class of cells.
The intention of this talk is to introduce some modern learning methods, with an emphasis on SVM (Support Vector Machines) as a supervised learning method, and COSA (Clustering of Objects using Subsets of Attributes) as an unsupervised method. Both methods will be discussed in the context of constructing new drugs that may operate through unknown pathways.