Co-clustering can be viewed as a generalization of clustering to a wider range of data. While clustering methods work on affinity data (data describing similarity between objects), co-clustering methods can also work on relational data (data describing relationships between objects). An example of affinity data is customers in market analysis, where each customer is described by a set of features (attributes), such as age, gender and income. A similarity measure between pairs of customers can be computed from their features, for example Euclidean distance. An example of relational data is persons in a social network, where a link between two persons indicate that they are friends. Here persons are compared on their connections to other persons and not on their intrinsic features.
In this dissertation we study the application of co-clustering to social network data and to medical data. In particular, we present a general formulation of co-clustering that fits most methods in the literature and provide solutions to three main problems: (1) clustering relational data under regular equivalence in social network analysis, (2) finding a symmetric clustering of asymmetric data and (3) clustering patients based on high-dimensional, time-varying, sparse physiologic data.
We define implicit similarity measures, by way of criterion functions for co-clustering, that solve the problems we target. We demonstrate and compare our co-clustering methods on real world data sets.
Juan Ignacio Casse (2014). Automatic Co-Clustering for Social Network and Medical Data. Doctoral dissertation, University of California at Riverside. |
@phdthesis{Cas14, author = "Juan Ignacio Casse", title = "Automatic Co-Clustering for Social Network and Medical Data", school = "University of California at Riverside", schoolabbr = "UC Riverside", year = 2014, month = Dec, }