Question: Somatic variants in exome normal/tumor data mostly in non-exonic regions. Is this a likely scenario?
gravatar for ivivek_ngs
5.3 years ago by
Seattle,WA, USA
ivivek_ngs4.8k wrote:

Dear All,

I would like to ask you about an issue am facing. While doing exome sequencing I earlier did an estimation to what extent my reads were falling on the target regions, ie the target intervals that are used for target enrichment and I found for my samples the reads were around 75% spanning the exonic region. But now when am translating them into the variants and trying to find the somatic variants and annotating the somatic variants with all 3 annovar, oncotator and snpEff I find only 30% SNPs(novel ones which are not in dbSNP) are actually on the exons. The rest are spanning the intronic,intergenic,splice ,UTRs etc. Is this a likely scenario? How often do you see the SNPs mostly annotated in the non-exonic regions even when you use the exome sequencing provided your reads have high coverage in the exonic intervals used for target enrichment? I would like to know your advice in such cases.



snp mutect next-gen snps • 2.6k views
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 5.3 years ago by ivivek_ngs4.8k
gravatar for Dan Gaston
5.3 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

Does this include synonymous mutations? If you filtered out synonymous exonic mutations you'll find that is a large contributor as well. Most of the non-exonic regions you listed are all either targeted, or will be captured due to overlap (some areas extending into introns) and the majority of all mutations in a cell happen to occur in non exonic regions simply because exonic regions make up such a small percentage of the genome. Of course since you have filtered out off-target variants in this case, you will enrich for exonic variants.


Coupled with filtering out mutations that are also in dbSNP (which many valid somatic mutations may overlap or match identically mutations found in dbSNP by the way, plenty of known somatic variants that appear in COSMIC, TCGA, etc are also found in dbSNP) the number you are reporting (30%) doesn't surprise me that much.

ADD COMMENTlink written 5.3 years ago by Dan Gaston7.1k

@Dan Gaston 

Yes if in case of annovar when I remove the synonymous mutations the number reduces further and drops to 22%. I am currently just keeping the variants which are mostly novel and not found in dbSNP, i am not discounting the ones found in COSMIC,TCGA but as far as my observation is concerned the ones found in COSMIC and TCGA are also entered in dbSNPs. I presume the reads overlapping the exon and intron actually are also under the target bed file but mutations mostly spanning those overlapping reads will be actually on the non exonic regions. I have used GATK (multicalling) , Mutect and VarScan for calling the mutations on my sample and then I tried to use Annovar, Oncotator and snpEff for annotation to reconfirm the results and I see that near about 30% mutations are exonic coming from all the 3 annotations. These variants which I sent for annotations are mostly high confident ones and statistically significant based on DP value and Phred score and QUAL score as well. In case of the Mutect as well I consider the high confidence once and for VarScan as well I remove the false positives using somaticFilter. Yes obviously the numbers are not similar for all 3 (number of variants output). I retrieve more mutations with Mutect but on annotating mostly the mutations are on the intronic(more than 50%). So you say this is not at all surprising provided that reads mostly lie on the exom bed file used for target enrichment. Then I would assume the mutations in my case are mostly in the borderline of exon intron of the corresponding reads. I would also like to say that I am only considering somatic mutations for my samples. So does my reason sound justified for the hits am getting after annotation? Please let me know your views.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by ivivek_ngs4.8k

Yes, it depends on you filtering strategy on whether you are getting just reads that happen to span say exon-intron junctions and such. Just keep in mind that with introns the target purposefully includes usually up to 50 or 100 bp's captured into introns. The targeted region is a bit larger than the exon so that you properly sequence splice sites and nucleotides near splice sites. But because the capture is a physical process based on oligo-binding and shearing of DNA, the chunk of DNA you capture is bigger than the oligos designed for the capturing. 


UTRs are of course targeted, and can be quite large for many genes. 

ADD REPLYlink written 5.3 years ago by Dan Gaston7.1k

@Dan Gaston 

"Just keep in mind that with introns the target purposefully includes usually up to 50 or 100 bp's captured into introns" 

I could not understand this part. Can you be a bit more clear. Yes I understand that this is a physical process and always the target kits does not span only the exons, sometimes they span more than that extending the introns to some extent which might include the splice sites and the introns as well. And the mutations if mostly falling on those regions for the reads will give a dependancy on the intronic regions for the genes that have the mutation for that particular base. Is that what you wanted to say? 

ADD REPLYlink written 5.3 years ago by ivivek_ngs4.8k

Yes, I was just pointing out that not all intronic variants are "off target", they are in the targeted capture region.

ADD REPLYlink written 5.3 years ago by Dan Gaston7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1230 users visited in the last hour