Hi,
I am attempting to reanalyze some old data. Recently, an updated genome was released for the organism I work with. I had to map the reads to the new genome and what not. So what I have now is a bam file but now I want to do some variant calling so I can generate a vcf.
I know there is a lot of programs out there to do this but I am attempting to find the best one. I have tried using Platypus but the vcf file that it generates never contains any genome data. That is probably my fault but I can't seem to figure it out and there isn't much support. I tried VarDict but when I go to look at the vcf file, I get an error of "failure to parse tbx_vcf". I can really seem to find why I am getting that error. I have tried Freebayes and that has worked for me, but I have an issue with indels. Freebayes seems to call SNPs instead of acknowledging indels. I know there is GATK, but it seem like such a tedious piece of software. It looks like you have to go through 15 steps to just get anything worthwhile. I could be looking at the wrong examples. What I have notice about GATK is they try to make it so simple that it ends up just making things complicated.
Could anyone maybe give some suggestions or maybe some insight? Any help is very appreciated
Thanks!!
I doubt there is an objectively best variant caller for each situation. You also seem to put a lot of focus on ease of use, so that's apparently also important....
I'm not so worried about ease of use. I try to trouble shoot and read about the different software but sometimes there just isn't enough support out there to help figure out your issue. I just need something that is going to work really.
Google Brain Team recently released DeepVariant. We implemented a reproducible version that was submitted to NF-Core (https://lifebit.page.link/cmRv).
We also made it available in Lifebit (https://lifebit.page.link/fiPH) if you want to try it with example parameters.
In practice, DeepVariant first builds images based on the BAM file, then it uses a DeepLearning image recognition approach to obtain the variants and eventually it converts the output of the prediction in the standard VCF format.
Just leaving this here for matters of completeness.
Has DeepVariant been benchmarked against the Gold Standards in clinical genetics, though? This question remained unanswered here, too: Why we ran DeepVariant as a Nextflow pipeline over cloud.
Its was ~1 year ago...
Hi, @Kevin Blighe I'm just adding a small contribution here, because I am now trying to understand how DeepVariant works: couldn't find any reference about its benchmarking in clinical genetics, but I came across this comparison study: DeepVariant x GATK4 x SpeedSeq.
Thanks. Information moves quickly and my answer below is already somewhat out of date due to the fact that the GATK to which I was referring was GATK v3. GATK v4 may have improved a lot.