Question: Indel results of GATK is not correct
0
gravatar for Yuu
4.2 years ago by
Yuu10
China
Yuu10 wrote:

Hi:

I use GATK (reference is b37) to call snp & indel ,but I have some problems with the results.

I select 2 sites from 2 samples to describe my questions:

sample1(WES) insertion :The site  3:8299641 was called as an insertion(G to GGAAGGAAGGAAGGAAGGAAGGAAC, but I didn't find any insertion in all of the mapping reads,and the "insertion sequence" can be found in the reference. Especially,the first and last bases of the insertion are "G" (3:8299641) and “C”(3:8299665),which I think should be called as snp, are not called as snp.

sample2(WGS) deletion: 1:3081756 was called as deletion (GGGACTTACCTGGCCTCAGGGGCAT to G), but I didn't find any deletion in all of the mapped reads.And it shows that reference also has these bases.

In addition I have checked the code many times and I'm sure I use the right bam file.

Is there anyone has the same problem? And what causes this?

AND

I  use RealignerTargetCreator and IndelRealigner for both sample1 and sample2.

It drives me crazy that I can't put pictures in. I use samtools tview to see the position(in sample1(WES,average depth above 100X, position 3:8299641)and put it below:

8299641   8299651   8299661   8299671   8299681   8299691   8299701   8299711
GGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGACTTTACTTTTACCACACTGGATTGTTTGTACGATTTAAATGAGAAA
S...............................................................................
............................................................     ,,,,,,,,,,,,,,,
........................C........           ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
........................C........             ..................................
.....................         ..................................................
........................C........              ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...............................................................   ,,,,,,,,,,,,,,
................................................................   ,,,,,,,,,,,,,
........................A........                 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
........................C........                                     ,,,,,,,,,,
C................................................................        ,,,,,,,
C....................................................                     ,,,,,,
C...........................................................               ,,,,,
C............................................................

 

sample2(WGS,average depth above 30X) deletion: 1:3081756

     3081761   3081771   3081781   3081791   3081801   3081811   3081821
GGGACTTACCTGGCCTCAGGGGCATGGACTCACCTGGGCTCAGGGACAGTGACTTACCTGGGCTCAGGGACAGGGATGCA
......Y........................................R......Y............R............
......C......G.......A..G........................G....C............A........C
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,c
.................................................G..A.C..A..................C...
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c,,,,,,,,,,,,,,,,,,,,,
................................................................................
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,g,,,,,,c,,,,,,,,,,,,a,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
......C........................................G......C............A........
...................................................G..G........G.........A..G...
gatk indel • 1.7k views
ADD COMMENTlink modified 4.2 years ago by rbagnall1.4k • written 4.2 years ago by Yuu10

It's difficult to see whether there problem is in GATK or your reading of the results without actually seeing the results. Can you post a screen capture from IGV, ideally showing the insertion/deletion (I'm not sure you can have IGV show the inserted sequence, but at least showing the region around it would be useful).

Also, did you run the indel realigner on the samples? What was the coverage like in those areas?

ADD REPLYlink written 4.2 years ago by Devon Ryan89k

I updated my question and hope it helps.Thks.

ADD REPLYlink written 4.2 years ago by Yuu10

Somewhat, it's still difficult to tell what exactly is going on without adding all of the sequences in. In the future, what people generally do is upload an image somewhere and just link to them (biostars doesn't host images for you).

You might look at the realigned BAM files and see if the edit distances in those regions are now lower. If they're not, then this would seem to be a GATK error. If the edit distances are lower and there are reads spanning the insertions/deletions then that's why this is happening (and it's quite possibly correct).

ADD REPLYlink written 4.2 years ago by Devon Ryan89k
0
gravatar for rbagnall
4.2 years ago by
rbagnall1.4k
Australia
rbagnall1.4k wrote:

In example 1, there are 4 reads with a “C” at 3:8299665. If you delete the sequence "GAAGGAAGGAAGGAAGGAAGGAAC" from those reads, then they would match the reference exactly. Notice that those reads end in the sequence 2x(GAAG)

There is a micro repeat of "GAAG" around the deleted sequence. GATK has made the assumption that insertion of 5x(GAAG)GAAC is more likely than a 3:8299665G>C. Which one is 'more' correct is anybody's guess...

 

example 2 is more difficult to read, but there also seems to be a (3x?) repeat of a motif GGACT(T/C)ACCTGG(C/G)CTCAGGG(G/A)CA(T/T). Is one copy of the motif deleted in some reads, and would this explain the observed (SNP-like) variants? GATK may assume one deletion event can explain all observed variants, and this is more likely than having multiple SNP-like events to explain the sequence.

ADD COMMENTlink written 4.2 years ago by rbagnall1.4k

Thanks for your answer!

ADD REPLYlink written 3.0 years ago by Yuu10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour