When I run samtools (v1.3.1) mpileup I am running into some very concerning issues. The reference bases for the output pileup file are not in the same position as the references in the input fasta. In fact, they appear to be oddly shifted. For example, here is output for 10 variants from the following samtools pileup command:
samtools mpileup -f mtDNA.fasta -r chrM -Q 0 -t AD,ADF,ADR --skip-indels input.bam > input.txt
chrM 1330 C 457 G$G$GGGgggtg...
chrM 1331 A 461 T$T$T$tttttT...
chrM 1332 A 455 c$c$c$c$c$CC...
chrM 1333 G 449 A$A$A$A$A$A$...
chrM 1334 G 463 A$,$,$AAaaaa...
chrM 1335 T 487 GGgggggggggg...
chrM 1336 G 529 .$.$,$,$,$,,...
chrM 1337 T 524 ,$,$,$,$,$,$...
chrM 1338 A 503 g$g$g$g$g$g$...
chrM 1339 G 497 TTcccctttTTT...
chrM 1340 C 506 A$A$ttttaaaA...
Here is output from faidx, showing the correct fasta sequence for these positions:
samtools faidx mtDNA.fasta chrM:1330-1340
chrM:1330-1340 GTCAAGGTGTA
Notice that the pileup format is shifted by 2 bases. Looking at this more closely, I identified some positions where the reference sequence would be "CCCCCCCCTCCCCCC" (8 Cs, 1 T, 6 Cs) but samtools had converted this to "CCCCCCCTCCCCC" (7 Cs, 1 T, 5 Cs).
Please don't post questions as tools. Instead, post them as questions.
Cross-posted to samtools-help and likely solved there. Please pick one forum at a timeā¦