Question: Lack of consensus between NGS & Sanger sequencing on indels/mutations
2
gravatar for alons
18 months ago by
alons220
Herzliya
alons220 wrote:

Hi all!

We have a working NGS analysis variant calling pipeline for cancer which we've recently ran on a sample. 

A certain mutation was found, specifically in chr3:178936095 (hg19), now as part of a confirmation process the lab ran a Sanger sequencing process on the mutation region and found out that a 2-based deletion and 1-base insertion wrt hg19, occurred in a close position, in chr3:178936116 and that the mutation we found seems like a false-positive. 

Looking through the VCF files, I didn't find the deletion+insertion position, nor did I find it when I checked the BAM file. 

We're using: 
BWA MEM as an aligner. Our command for running is:  bwa mem -t 12 -R <read groups info> <ref genome> <fastq files> > aligned.sam
- Realigning around indels using GATK.

- Calling variants with FreeBayes, GATK & bcf tools. 
- Currently there's no filter based on read quailty other than the default one in bwa mem. 

there was a consensus between all 3 on the mutation we found from the NGS data.

This has led me to think that the reads 'supporting' that hypothesis weren't aligned from the .fastq files as they were too different from the reference or got a 'lower score'. 

My question is: Has something similar ever occurred to you? Is there a parameter/argument I can provide BWA MEM with so that those missing reads would be aligned? Otherwise, can you recommend a program/pipeline I can use to call these indels/insertions successfully? 

Thank you very much in advance,
Alon

ADD COMMENTlink modified 18 months ago • written 18 months ago by alons220
1

Was this a germline or somatic mutation?

ADD REPLYlink written 18 months ago by Daniel Swan13k

We're not sure, the only sample was from the tumor. 

ADD REPLYlink written 18 months ago by alons220
1

You do not provide much detail, you would probably get more helpful replies if you provide a better description of your analysis. Do you perform realignment around indels? You are aligning with bwa mem, but how are you calling the variants? Do you perform quality filtering on your NGS variants?

ADD REPLYlink written 18 months ago by h.mon5.9k

You're right, thanks. I've edited my question, hopefully now the info is sufficient. 

ADD REPLYlink written 18 months ago by alons220
6
gravatar for lh3
18 months ago by
lh329k
United States
lh329k wrote:

By "false positive", do you really mean "false negative"? By "2-based deletion and 1-base insertion", do you mea "2-base insertion and 1-base deletion wrt hg19"?

You can cut the flanking sequences and map to hg19 again. You will find that chr3:178936095 is similar to the ~chr22:17052975. They differ by a couple of mismatches and one gap. The gap difference happens to occur at chr3:178936116. It is possible that bwa-mem maps the reads on chr3 to chr22 (false mapping); it is also possible that your sanger PCR is non-specific (false priming and thus NGS is correct). Hard to conclude without detailed analysis.

ADD COMMENTlink written 18 months ago by lh329k

Thanks for the detailed answer!
By false-positive I mean that our variant callers called a mutation there based on NGS data, but when inspected in the sanger results, the peaks (according to the lab) indicated that the mutation wasn't real. 
By  "2-based deletion and 1-base insertion" I mean with regard to hg19, 2 bases from that position in hg19 were deleted and then there was an insertion of a single base (that wasn't in hg19).
I will definitely look at chr22:17052975 and see if it was aligned there. 
By "flanking sequences" you mean 5' flanking region and/or 3' flanking region? How does one go about cutting them - remove them from the .fastq files?
The Sanger PCR the lab has done was based on primers they created from the gene the position's in, PIK3CA gene.

ADD REPLYlink written 18 months ago by alons220
1

I am confused. You said "Looking through the VCF files, I didn't find the deletion+insertion position, nor did I find it when I checked the BAM file". If you did not find the indels from NGS data, how could they be false positives?

ADD REPLYlink written 18 months ago by lh329k
1

Or do you mean that Sanger and NGS find different mutations?

ADD REPLYlink written 18 months ago by lh329k

First of all, you deserve a huge thank you - I have news: I found an exact match in the bam file, between a sequence in the area you've mentioned and the mutation sequence that was deduced from the lab analysis of the Sanger sequencing! 
9 identical 45 bases long Forward & Reverse sequences were found perfectly aligned in the bam file to somewhere near the area you've mentioned. I've pasted them in UCSC BLAT tool and it said that it had a 100% match to the area in chr22 and a 97.8% match to chr3. This means bwa mem did a perfectly good job in aligning as it had a 100% match with said area. The mutation apparently made the sequences be identical to that area and thus weren't aligned to where they were originated in the DNA.

I'm actually super impressed BTW, did you use BLAT to tell the areas were similar?

Now, what can one do in the future if such mutations cause alignments to completely different areas? The lab said that the mutation we've called (maybe falsely) in chr3:178936095 might have been caused by the real indel in chr3:178936116.

p.s
I meant that I couldn't find the indel the lab called based on the Sanger sequencing. Also, Sanger found the indel but indicated the mutation that was found in chr3:178936095 wasn't real.

ADD REPLYlink written 18 months ago by alons220
1
gravatar for Brian Bushnell
18 months ago by
Walnut Creek, USA
Brian Bushnell11k wrote:

If you are interested in detecting and calling indels, I recommend mapping with BBMap with default parameters, as it tends to call indels  much more accurately than any other aligner.  But it's not just the aligner that's important here - the variant-caller will play a role too.  I'm not sure which is currently best, but there are several alternatives - mpileup, GATK, and freebayes (although unfortunately freebayes cannot process the current SAM specification).

You may want to look at your bam file in IGV to see if the problem is that the reads were unmapped or incorrectly mapped (and thus there are no reads indicating indels), or whether the problem is the variant caller (in which case IGV would show reads indicating the indels).

Edit - specifically, if you want to use freebayes with BBMap, use BBMap's "sam=1.3" flag to generate old-style cigars strings with "M" instead of "=" and "X".

ADD COMMENTlink modified 18 months ago • written 18 months ago by Brian Bushnell11k
1

I have used freebayes on sam files generated by bbmap. just need to convert to bam, sort and index, maybe also remove duplicates and addgroups.

ADD REPLYlink written 18 months ago by apelin20460
1

Gatk-hc completely ignores gap placement. It doesn't matter whether the gap placement is good or not. Freebayes realigns reads. As long as the gaps are present on a couple of reads, they can fix the alignment of other reads. Realignment from multiple reads is much more powerful than aligning each read independently. Especially for modern variant callers, the accuracy on placing gaps is usually unimportant. For a mapper, what is most important is whether it can map the whole read around the correct position with accurate mapping quality.

Freebayes is the best general-purpose open-source variant caller. I would suggest making bbmap work with it, no matter whether freebayes supports all SAM/BAM features.

ADD REPLYlink written 18 months ago by lh329k

I completely agree about FreeBayes, it's now our main variant caller as it called variants that were mostly confirmed by Sanger sequencing. It's also good for cancer samples as it has low AF / high ploidy parameters you can add to make it ultra sensitive.

 

ADD REPLYlink written 18 months ago by alons220

Thanks for the detailed answer, I will try using BBMap and see how it works out, we're currently using the variant callers you've mentioned so it shouldn't be a problem to start from BBMap and then continue 'as usual' with said variant callers, I will try and adjust it so that freebayes could read it properly. 

I did look at the bam file in IGV and samtools tview and it did seem that those reads were unmapped/incorrectly mapped as none of the reads in the position supported the 2 base deletion and 1 base insertion the Sanger analysis indicated, leading me to think the problem wasn't in the variant caller.

ADD REPLYlink written 18 months ago by alons220
1
gravatar for apelin20
18 months ago by
apelin20460
Canada
apelin20460 wrote:

There is an old trick in the book that almost never fails. If the mutation you saw in your chromatogram file is significant (somewhat abundant) and real (not an artifact), then there is no reason for it not to be in your reads. You can use grep in your reads to find out how many reads support that indel. Simply take 13bp to the left and 13bp to the right of your variant, and grep that sequence. For instance, let's say you have the following scenario:

Ref:      

ATGCTAGCTGATCGATCGTGACTGCTG

Mutant: 

ATGCTAGCTGAT--ATCGTGACTGCTG

I would grep the mutant genotype "ATGCTAGCTGATATCGTGACTGCTG" from my fastq file.

ADD COMMENTlink modified 18 months ago • written 18 months ago by apelin20460

Thanks, I've actually thought about it but wasn't sure it can be done efficiently in large .fastq files - now that you've said it's basically doable I will definetly try it out!

ADD REPLYlink written 18 months ago by alons220
1

the bigger the coverage, the more reliable the method is. Don't forget to also take the reverse compliment as well and grep it: grep -c "GENOTYPE" file.fastq

ADD REPLYlink written 18 months ago by apelin20460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 574 users visited in the last hour