Question: What Figure You Make For Qc Of Whole Exome Sequencing Data
1
gravatar for User 1933
6.1 years ago by
User 1933340
User 1933340 wrote:

For analyzing QC of fastq files I know fastQC; It generates lots of statistics; Also, there are a set of tools (GATK, samtools, bedtools) to check the quality of reads after mapping.(bam files)

My question is, given different exome sequencing batches, and say different samples per batch; what parameters/figures would you make to present the quality of data in different steps ?

exome qualitycontrol • 3.4k views
ADD COMMENTlink modified 6.1 years ago by Sean Davis25k • written 6.1 years ago by User 1933340

check this post: fastQC html report to PDF (with a script)

ADD REPLYlink written 6.1 years ago by Rm7.8k

the post you have refereed to, is about converting html to pdf; I don't want to make 300 pdf from 300 samples, ...

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by User 1933340

in my ans in that post i selected few images out of FASTQC ; those will be useful to you...

ADD REPLYlink written 6.1 years ago by Rm7.8k

I'm a little unclear on what you are asking. Is your question "Of all the data that I can get out of a quality control assessment, what would be the most important criteria to communicate in regards to my quality control pipeline/algorithm?"

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Josh Herr5.6k
4
gravatar for Sean Davis
6.1 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

There is no single tool or even data level that you can track that will catch all problems, but a good high-level metric is the % of bases with coverage > 30x (30x is a minimum) for targeted bases. This number will get you a rough sense of how effective your variant calling will be. You can get this information from picard CalculateHsMetrics.

Others that are less important for gauging the success on a per-sample basis and can be tracked over time to assess systematic sequencing problems might include:

  1. Insert sizes
  2. Error rate
  3. Duplication rate/effective library size
  4. % aligned
  5. Overrepresented sequences
  6. Adapter contamination
ADD COMMENTlink written 6.1 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour