I am trying to generate a .vcf file and call variants in RNA-Seq data, as per the GATK3 tutorial: https://software.broadinstitute.org/gatk/documentation/article.php?id=3891
I can get all the way through making a filtered .vcf file, but the resulting file has no entries - i.e., the .vcf file is only 47 lines, and all of that is header information. The columns are empty. The unfiltered .vcf file in the HaplotypeCaller is also empty.
My questions are:
- First - as a sanity check - this is an unlikely outcome, right? It's not that my RNA-Seq coverage was too shallow (~30,000,000 paired end reads) and therefore we are unable to detect any variants?
- Second - if we should, indeed, see some variants in that file - since it's hard to tell if there are variants at all until you actually generate a .vcf file (i.e., you can't look in a bam file and decide that variants exist) - what parameters should I tweak in order to try and get variants in my final .vcf file? I'm not sure what intermediate steps I can look at to troubleshoot where in the pipeline things are going wrong.
I didn't perform Indel Realignment (step #4) or base recalibration (step #5) because it was unclear to me from the tutorial if this was necessary for RNA-Seq. Right before step 6, I also had to use ReorderSam and BuildBamIndex in order to make the intermediate bam files generated match the ordering of the .fa file I am using as an index. I'll work on getting base recalibration working next, but if anyone has additional suggestions or answers to the above, I'd appreciate it.
Thank you for your help!