What Are The Metrics To Determine The Quality Of A Whole Genome Sequence
12.5 years ago
Hi, I would like to generate a set of metrics to be able to evaluate the general quality of a whole genome sequence before I can start analyzing it with a reasonable confidence that the variation I am after is in the haystack. I know there are tools like fastqc that generate reports but without knowing pretty well what you should expect the tools are less effective. I know there is not a single criteria and everyone has their own list of things but I think there are some common criteria that most people could aggree on.

example GC% should be between 40-50 or total number of reads should be >3 million etc. Thanks

12.5 years ago
At FastQC's page http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

You'll find that there are a couple of examples of a 'good' and a 'bad' quality runs.

You are right that there are no well defined thresholds for saying when a run has gone 'bad'. I think it the answer here is that it very dependent on what kind of analysis you planning to do downstream.

Edit: The main criteria I use is that if the quality plot goes below 25 very fast, then it's time to start trimming (or re-sequencing). The GC criteria doesn't apply always (e.g. some plasmodiums are very AT rich and for sure doesn't apply for bisulfite sequencing).


