Question: How Many Is Too Many? Germline And Somatic Coding Indels And Snps In Cancer Exome Capture.
6
gravatar for Prateek
7.3 years ago by
Prateek1000
Boston, MA
Prateek1000 wrote:

Would anyone care to share their experience with variant calling in cancer genomics using tumor - normal pair to find somatic vs germline variants especially indels?

I have been getting an unbelievably high number of germline indels that are "coding" after running GATK somatic Indel detector on a tumor-normal samples. Even after pretty strict coverage filters both for normal and tumor, we get ~20-30 somatic coding small indels (which I can digest) but about 600 coding germline indels - ~50% of them frameshift!

These are pretty convincingly "germline" when you look at the coverage in "normal" samples (to confirm germline events). I know this cannot happen and am trying to investigate the reasons - could there be

  1. Alignment issues
  2. contamination of normal (less likely as it is blood vs paraffin tumor)
  3. Annotation version issues (I have rechecked and eliminated this cause)

Any help is appreciated Thanks

Additional info:

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline

indel gatk variant cancer • 4.1k views
ADD COMMENTlink modified 7.3 years ago by David Quigley11k • written 7.3 years ago by Prateek1000

By any chance are a lot of these indels close to repetitive sequences?

ADD REPLYlink written 7.3 years ago by Gww2.6k

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the one I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLYlink written 7.3 years ago by Prateek1000

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the ones I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLYlink written 7.3 years ago by Prateek1000

What do your quality metrics look like? If they don't have a high quality score and good coverage, it's probably junk. See how many you have left if you use SNP quality cutoffs of 50 or 75.

ADD REPLYlink written 7.3 years ago by Docroberson280

And you ran an "indel realignment" step on both the tumor and normal BAMs?

ADD REPLYlink written 7.3 years ago by Aaronquinlan10k

@aaron - yes both files were run through local indel realignment.

ADD REPLYlink written 7.3 years ago by Prateek1000
2
gravatar for Chris Miller
7.3 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

Look at the frequency of reads supporting those SNV calls. If they're close to 50% in normal, then yeah, it's probably a germline event. If you have tumor contamination in the normal (common) and have calls at lower percentages (say, 10%), then you can be fairly confident that contamination is what you're seeing.

ADD COMMENTlink written 7.3 years ago by Chris Miller20k
2

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're probably not particularly interesting either.

ADD REPLYlink written 7.3 years ago by Chris Miller20k
1

I generally just grab the appropriate dbSNP track from UCSC. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp132.txt.gz

ADD REPLYlink written 7.3 years ago by Chris Miller20k

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline.

ADD REPLYlink written 7.3 years ago by Prateek1000

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're not particularly interesting either.

ADD REPLYlink written 7.3 years ago by Chris Miller20k

@Chris - I eyeballed a couple and did find some incidence of proximity of our indels with those from dbSNP (~within 10-20 bp). Although not exactly the same alleles. I am planning to search the entire set against dbSNP.. Do you know a tool that can already do that? else I'll download the entire set from ensembl by using framehift / complex indels as the filter.

ADD REPLYlink written 7.3 years ago by Prateek1000

@Chris - you were right. A lot of them are from dbSNP. However, I still need to find out how and why so many of them can be tolerated in a single individual!

ADD REPLYlink written 7.3 years ago by Prateek1000
1
gravatar for David Quigley
7.3 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

Perhaps the samples were accidentally swapped at some point, and your "normal" is really the tumor DNA and vice versa. These things can happen. You're going to have to do follow-up validation of interesting candidates in your own samples anyway.

ADD COMMENTlink written 7.3 years ago by David Quigley11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour