I am by profession a applied statistician. I have strong interest in analyzing next-generation sequences (NGS) data. I would like search and see a big picture about the challenges for statisticians. so far I have go through the recent issues related to bioinformatics journals, but still confused. Could you recommend some review article(s) or other material.
What aspect of NGS are you interested in? First there is the process by which the tags are generated. For instance a ChIP experiment or other possibilities like GRO-seq, MNase-seq, DNase-seq FAIRE-seq and many more. Due to the fact that each has its own unique underlying biology, the nature of data generated can have different properties. Then there is the sequencing process itself which might be on an Illumina machine for instance. Then there is the process of detecting enrichment or peak-calling. Then there is the process of identifying reproducible results. Then there is the process of figuring out if two sets of genomic features overlap. (I am assuming here that you mean NGS for DNA-seq or RNA-seq experiments and not for other tasks like genome assembly or calling SNPs). In short, could you narrow it down?
@George is correct. It will take you more than a few days to find a niche problem. You can't shortcut the process toward publication. You need learn the field.
My comment is not meant as a discouragement. I can think of a few cool statistics-heavy papers that I would be happy to share. However, it depends on what part of the field you are most likely to be working on.
I am mainly interested DNA-seq or RNA-seq data types and the process of detecting enrichment or peak-calling. Furthermore my focused research is approximate inference for network biology/complex system and estimation of missing values.
I'd suggest getting some data and a set of questions to start. You'll learn a lot by just going through the process of analyzing data like those from the ENCODE project. Expect that to become comfortable with the data and questions could take a few weeks to months.
I think I would suggest modENCODE project, just because the genomes are smaller and thus processing is often faster. Assuming that the statistics is what he cares about and not the organism per se.