Question

What Is A Good Interval For Minimum And Maximum Coverage For Next Gen Sequencing

5

Entering edit mode

14.5 years ago

Biomed 5.0k

Please excuse me for I know the question is quite vague and doesn't actually explain what I have in mind. Here is my question. I sequenced human exomes for clinical purposes, so my goal is to determine if there is a variant in one of my genes of interest that explains the disease in question. The technical issue here is that how well did we capture/sequence the genes (exones) in question. I can get the coverage data but I would like to get your input in interpreting the data. Now very low coverage is not good since the whole idea is to benefit from redundancy in the next gen sequencing data and more (good quality mapped reads to this region is beter) but also when you have a lot of reads (i.e >150X) there can be something funny going on here as well such as a repeat region or a low complexity region that allows reads from different sites to map to this part of the genome. In short what would be a good heuristics/algorithm to decide if the next gen exome/genome sequencing reaction was good enough to move forward with further analysis to confidently make decisions for genetic causes of a disease?

Thanks

next-gen sequencing coverage • 5.5k views

ADD COMMENT • link updated 14.5 years ago by Brad Chapman 9.7k • written 14.5 years ago by Biomed 5.0k

Ram · Answer 1 · 2011-01-08

8

Entering edit mode

14.5 years ago

Brad Chapman 9.7k

One approach is to look at overall coverage in your capture regions. Picard's CalculateHsMetrics provides a number of measures for this:

Some guidelines we've used on projects are:

FOLD_ENRICHMENT -- Generally how well you've managed to target the desired regions. The higher the better for this value.
PCT_TARGET_BASES_10X -- the percentage of target regions with at least 10X coverage; would like to see 90%+.
ZERO_CVG_TARGETS_PCT -- the percentage of missed target regions; ideally less than 5%.

This provides a sense of how uniformly your results sample the targets. Another approach is to plot a histogram of coverage over all your targets. Both will help distinguish between single high repeat regions and good general coverage.

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 14.5 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thank you for specific recommendations.

ADD REPLY • link 14.5 years ago by Biomed 5.0k

0

Entering edit mode

Hi Brad, This is all good for the whole genome but what if I am interested in a specific gene. Would you or others have any recommendations for that case. Thanks

ADD REPLY • link 14.5 years ago by Biomed 5.0k

0

Entering edit mode

Glad that helped. For a specific gene, I would recommend focusing on the metrics that your SNP or indel caller reports for your variations you are interested in. We use Broad's GATK genotyper and recommendations.

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 14.5 years ago by Brad Chapman 9.7k