I'm reading papers which develop approaches to look for rare variants, esp. since GWAS has failed to explain the "missing heritability". Surprisingly, though it's still far away from being able to afford whole-genome sequencing over large number of samples; there have been numerous statistics/approaches developed for rare variants using sequencing data.
Some of my readings are as follows:
http://www.ncbi.nlm.nih.gov/pubmed/18691683 (quite early and influential: combined multivariant and collapsing; that is first collapse those rare variants after bin them according to allele frequency; then apply multivariant-testing; to make use of the power of two approaches)
http://www.ncbi.nlm.nih.gov/pubmed/19214210 (kind of set a weigh for each rare variant according to frequency)
http://www.ncbi.nlm.nih.gov/pubmed/19810025 (use multiple regression model; phenotype dependent variable vs collapsing rare variant independent variable)
http://www.ncbi.nlm.nih.gov/pubmed/21521787 (more recently, calculate functional principal component analysis)
http://www.ncbi.nlm.nih.gov/pubmed/22262732 (more recently, take account sequencing quality)
Since large-scale sequencing data is not available, most of them just use simulation data or sequencing data around certain genes. I'm just wondering can anyone introduce any experience of using all these approaches? Which one may be the best? It's very confusing and scary to beginners.... Also, we can have some discussions about how to improve such approaches when it comes to sequencing data. For example, compared to SNP array those common variants, we need to seriously take account sequencing errors. (That's why the last two approaches come out)