Question

What Figure You Make For Qc Of Whole Exome Sequencing Data

1

Entering edit mode

11.0 years ago

User 1933 ▴ 340

For analyzing QC of fastq files I know fastQC; It generates lots of statistics; Also, there are a set of tools (GATK, samtools, bedtools) to check the quality of reads after mapping.(bam files)

My question is, given different exome sequencing batches, and say different samples per batch; what parameters/figures would you make to present the quality of data in different steps ?

exome qualitycontrol • 4.7k views

ADD COMMENT • link updated 11.0 years ago by Sean Davis 26k • written 11.0 years ago by User 1933 ▴ 340

0

Entering edit mode

check this post: fastQC html report to PDF (with a script)

ADD REPLY • link 11.0 years ago by Rm 8.3k

0

Entering edit mode

the post you have refereed to, is about converting html to pdf; I don't want to make 300 pdf from 300 samples, ...

ADD REPLY • link 11.0 years ago by User 1933 ▴ 340

0

Entering edit mode

in my ans in that post i selected few images out of FASTQC ; those will be useful to you...

ADD REPLY • link 11.0 years ago by Rm 8.3k

0

Entering edit mode

I'm a little unclear on what you are asking. Is your question "Of all the data that I can get out of a quality control assessment, what would be the most important criteria to communicate in regards to my quality control pipeline/algorithm?"

ADD REPLY • link 11.0 years ago by Josh Herr 5.8k

score 4 · Answer 1 · 2013-04-28

There is no single tool or even data level that you can track that will catch all problems, but a good high-level metric is the % of bases with coverage > 30x (30x is a minimum) for targeted bases. This number will get you a rough sense of how effective your variant calling will be. You can get this information from picard CalculateHsMetrics.

Others that are less important for gauging the success on a per-sample basis and can be tracked over time to assess systematic sequencing problems might include:

Insert sizes
Error rate
Duplication rate/effective library size
% aligned
Overrepresented sequences
Adapter contamination