Question: What Is A Good Interval For Minimum And Maximum Coverage For Next Gen Sequencing
5
gravatar for Biomed
8.4 years ago by
Biomed4.5k
Bethesda, MD, USA
Biomed4.5k wrote:

Please excuse me for I know the question is quite vague and doesn't actually explain what I have in mind. Here is my question. I sequenced human exomes for clinical purposes, so my goal is to determine if there is a variant in one of my genes of interest that explains the disease in question. The technical issue here is that how well did we capture/sequence the genes (exones) in question. I can get the coverage data but I would like to get your input in interpreting the data. Now very low coverage is not good since the whole idea is to benefit from redundancy in the next gen sequencing data and more (good quality mapped reads to this region is beter) but also when you have a lot of reads (i.e >150X) there can be something funny going on here as well such as a repeat region or a low complexity region that allows reads from different sites to map to this part of the genome. In short what would be a good heuristics/algorithm to decide if the next gen exome/genome sequencing reaction was good enough to move forward with further analysis to confidently make decisions for genetic causes of a disease?

Thanks

next-gen coverage sequencing • 3.8k views
ADD COMMENTlink modified 8.4 years ago by Brad Chapman9.4k • written 8.4 years ago by Biomed4.5k
8
gravatar for Brad Chapman
8.4 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

One approach is to look at overall coverage in your capture regions. Picard's CalculateHsMetrics provides a number of measures for this:

Some guidelines we've used on projects are:

  • FOLD_ENRICHMENT -- Generally how well you've managed to target the desired regions. The higher the better for this value.
  • PCT_TARGET_BASES_10X -- the percentage of target regions with at least 10X coverage; would like to see 90%+.
  • ZERO_CVG_TARGETS_PCT -- the percentage of missed target regions; ideally less than 5%.

This provides a sense of how uniformly your results sample the targets. Another approach is to plot a histogram of coverage over all your targets. Both will help distinguish between single high repeat regions and good general coverage.

ADD COMMENTlink written 8.4 years ago by Brad Chapman9.4k

Thank you for specific recommendations.

ADD REPLYlink written 8.4 years ago by Biomed4.5k

Hi Brad, This is all good for the whole genome but what if I am interested in a specific gene. Would you or others have any recommendations for that case. Thanks

ADD REPLYlink written 8.4 years ago by Biomed4.5k

Glad that helped. For a specific gene, I would recommend focusing on the metrics that your SNP or indel caller reports for your variations you are interested in. We use Broad's GATK genotyper and recommendations: http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v2#Making_analysis_ready_calls_SNP_calls_with_hard_filtering

ADD REPLYlink written 8.4 years ago by Brad Chapman9.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1484 users visited in the last hour