I have a problem with getting the correct genotypes from a samtools
mpileup generated vcf file.
In this post: A: ./ Vs "0/0" In Vcf Files, the user says DP=0 causes the genotype to be ./. which is exactly what I want. However, my genotypes are assigned 1/1 if DP=0 for the SNP for any/all individuals.
Am I calling the SNPs in a wrong way? Below are my commands (numbers as prefix for order & clarity):
1. bwa/0.7.4/bin/bwa aln -t 4 Ref.fa sample1.fq > sample1.sai 2. bwa/0.7.4/bin/bwa samse Ref.fa sample1.sai sample1.fq > sample1.sam 3. samtools-0.1.19/samtools view -bS sample1.sam > sample1.bam 4. samtools-0.1.19/samtools sort sample1.bam sample1.sorted 5. samtools-0.1.19/samtools mpileup -BuDI -f Ref.fa sample1.sorted.bam sample2.sorted.bam .. sampleN.sorted.bam | bcftools view -bvcg - > all_samples.bcf 6. samtools-0.1.19/bcftools/bcftools view all_samples.bcf | vcfutils.pl varFilter -D1000 > all_samples.vcf 7. vcftools_0.1.12a/bin/vcf-to-tab < all_samples.vcf > all_sample_SNPs.out
Only when parsing the 'all_sample_SNPs.out' file did I realize that my genotype calls are missing the N/N or NN alleles in entirety. I know my data, so no way the missing alleles are gone at ALL possible locations. So I must be doing something wrong.
Of course it is not
vcf-to-tab's fault as the VCF file assigns 1/1 genotype to those with 0 read depth at that particular locus. I'd appreciate it a lot if someone could point towards where I could be messing up.
Alternatively: is there a filter or another way to convert the vcf file so that sites with DP=0 are assigned the missing value of ./. or N/N or any other similar nomenclature?