Hum Genet. 2012 Sep;131(9):1395-401
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.