Hi all
I'm using VarScan to identify indels but I'm a little concerned/worried/picky about the results. I have matched normal (blood) + tumor, I kept only somatic indels with p.value < 0.05.
VarScan indels are annotated using:
location --> reference --> indel
where the indel is at position +1 from the location/reference.
I wanted to see what the mutated sequence looked like to create a file for AnnoVar, so I used bedtools getfasta to obtain the 3 nucleotides at position n, n+1, n+2.
Then I put side by side the reference sequence with VarScan calls (see below) and it seems that some of the inserted nucleotides are where the same nucleotide is already present more than once.
------- germline sequence -------------------       --------- VarScan somatic calls ----
1        18024116        18024118        CAA    chr1    18024116        C       +A
1        145441055       145441057       TGA    chr1    145441055       T       -GA
1        158818808       158818810       GTT    chr1    158818808       G       +T
1        184760624       184760626       CTT    chr1    184760624       C       -T
2        20101222        20101224        GAA    chr2    20101222        G       -A
2        20469601        20469603        TAA    chr2    20469601        T       +A
2        98263908        98263910        TAA    chr2    98263908        T       +G
2        101886117       101886119       CAA    chr2    101886117       C       +A
2        144485366       144485368       ATT    chr2    144485366       A       +T
2        162023306       162023308       CAA    chr2    162023306       C       +A
3        4699806         4699808         AGG    chr3    4699806 A       -G
3        9497744         9497746         GTT    chr3    9497744 G       +T
for example in row 1 I have an A inserted in position 18024117 after a C (at 18024116)...however: how do I (or VarScan) know whether the +A is inserted in the first position after the C or in the second position after the C?
Thanks!
Len, thanks for the explanation. I thought too that it didn't make any difference whether the A was inserted in first or second position since it's an A already.
as far as SNV/SNPs go, I called them from three different tools: VarScan, SomaticSniper and Mutect and I kept the intersection of the three to increase stringency. for indels, I only used VarScan and I annotate it with Annovar to retain only those in exonic regions that cause either frameshift or truncated proteins. I will look at RGT tools, I haven't tried it yet. thanks