Hi,
I am in the process of realigning some samples to an updated genome. For the majority of the samples, I had the original fastq files. For a couple of my samples, I was only given a bam file. So what I did was use Picards "SamToFastq" function to convert the bam file back to a fastq file. Since I didn't know how the data was trimmed before I received it, I ran it through trimmomatic (ended up being a waste of time), aligned with bwa, and sorted/marked duplicated with Picard.
I then used samstool for variant calling. All of the samples that were originally bam files, weren't get any of their variants called when I looked at the vcf file. So I don't know if I messed something up after I converted the bam files back to fastq files. When I compare I compare the vcf to a newly generate bam file, you can see that there are SNPs and what not in the bam file but they aren't being called by samstool. If it was happening to other samples then I would think that I am having an issue with samtools, but this is only happening to the samples that were originally bam files that were converted back to fastq files. Am I doing something wrong here? I'm just trying to figure out what I did wrong here
hmm.. how do you do that?
show us the output of
samtools flagstat input.bam
for example.This is what I got back.. looks like they aren't properly paired? Am I interpreting that correctly?
yes Devon is right. Check you haven't switched reads R1 and reads R2.
Wooooooooooow. The problem came up when I did that pointless trimming. I made an error and trimmed R1 twice instead of R1 and R2. Thanks for the help everyone. I really really appreciate your help.
I have moved the original comment from Devon which hinted you to the correct solution to an answer, as such you can mark it as accepted and mark this question as solved.