What Are The Next Steps For Processing Indels From Gatk
3
2
Entering edit mode
11.0 years ago
bari1 ▴ 40

Hi, I have raw calls for indels called using GATK, I wonder, where to start? filtering these calls? Should I plot the quality, depth etc. and then decide threshold to filter the bad calls? Any suggestions, scripts would be useful.

chr10    264423    .    G    GC    29875    .    AC=26;AF=1.00;AN=26;BaseQRankSum=-0.054;DP=717;FS=8.633;HaplotypeScore=344.1280;InbreedingCoeff=-0.0046;MLEAC=26;MLEAF=1.00;MQ=38.57;MQ0=0;MQRankSum=-0.832;QD=41.67;RPA=1,2;RU=C;ReadPosRankSum=0.979;SB=-4.035e+03;STR    GT:AD:DP:GQ:PL    1/1:6,242:250:99:10448,743,0    1/1:10,193:203:99:8479,571,0    1/1:3,142:148:99:6167,439,0    1/1:0,13:13:39:551,39,0    1/1:1,9:10:30:424,30,0    1/1:1,13:15:39:550,39,0    1/1:0,9:9:27:382,27,0    1/1:1,15:16:48:667,48,0    1/1:1,11:12:33:467,33,0    1/1:1,8:9:27:382,27,0    1/1:0,18:18:54:764,54,0    1/1:0,11:11:33:467,33,0    1/1:0,3:3:9:127,9,0

Looking forward for your suggestions and feedback. /Bari,

gatk indel filtering depth-of-coverage • 3.4k views
ADD COMMENT
1
Entering edit mode

What are you looking for in these indels? What experimental model are you using? What's your hypothesis? It's hard to really answer your question meaningfully without more information.

ADD REPLY
0
Entering edit mode

These indels are from resequencing data from cows, we have two groups of cows and we want to see indels specific to one group or other but not both. (Does this answers your question?)

ADD REPLY
0
Entering edit mode

Then your task is going to focus on three things in your subject genotype calls (the part of your vcf above preceded by 1/1...etc) See my answer below.

ADD REPLY
2
Entering edit mode
11.0 years ago

Have you subjected your data to variant recalibration? This will help you further separate machine artifacts from true variants.

http://www.broadinstitute.org/gatk/guide/topic?name=best-practices

ADD COMMENT
0
Entering edit mode

I have done the target_recreator and realign steps before calling the indels. Do you suggest to recalibrate after indel calling? (I donot have any known set of indels)

ADD REPLY
0
Entering edit mode
11.0 years ago

Well you may use several parameters to filter bad calls. SOme of them are read depth, mapping quality etc. OR you may try GATL Variant Filtration asmentioned here http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html

ADD COMMENT
0
Entering edit mode
11.0 years ago

It sounds like this task is going to focus on three things in your subject genotype calls (the part of your vcf above preceded by 1/1...etc)

(1) Filter by DP and GQ (ie keep reads where DP is reasonable for your experimental design -- so what is the minimal coverage depth that is acceptable for you? 30x? 40x? Do you expect postzygotic mosaicism? If not, then 30x or 40x may be fine.) GQ start with 99. You will find GQ score often correlates with DP.

(2) Then filter by your haplotypes. Are all cows from your experiment in this one vcf? Or do you have another vcf with the other cows? If they are all in this vcf you need to write a script to filter for 1/0 say at position [10] but not at (iteratively) the other positions. There are scripts findable out there that you can modify, or you can write your own.

(3) What is the target of your resequencing? Is this exome data? I was asking about your hypothesis since that will drive your annotation. If this is resequencing coding regions, then you probably want to annotate the genes included in these regions. I don't know what you use for bovine annotation. A bovine version of annovar? But after you annotate you can sort your data also by what genes are affected in groups of cows.

ADD COMMENT
0
Entering edit mode

Thanks for the detailed answer.. can you please expalin 1/1? Do you mean i should only retain the homozygous calls and filter out the hetrozygous ones (0/1 or 1/0 ???)

In my case, i have really variable depth of coverage for different groups (that i think i would need to take into account while comparing depth between groups)

All cows from the experiment are in file. and I do have annotations available for bovine reference assembly and plan to use annovar.

ADD REPLY
0
Entering edit mode

1/1 just means that is the part of your file you'll be working a lot on. Depending on what you want to find, your calls of interest could be 0/1 or 0/0. 1/1 is just an example.

I would check to see what acceptable coverage is in the bovine NGS field these days. My guess is >40x is best, just like anywhere else, so keep the calls with that kind of coverage first.

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6