Hi! I used the samtools commands and 1000 genome data(phase 3, low-coverage sequenced data) to generate the VCF file for both snp and indel, and below are the two command lines to get the vcf file for 5 people. The final "5genomes.var.flt.vcf" file has around 163432 snp and indel sites. However, when I used the same way to generate the vcf file for 100 people, the final vcf file includes only 1465 snp and indel, which is too few. I checked the last locus for both files, it looks like both of them processed the whole chromosome 20.
Do you have any idea what's wrong with it?
Furthermore, in some others' VCF file generated by Samtools, for the ALT field, it includes the 'X' in many cases, but in my VCF files, the non-ref base 'X' never show up.I used the samtools-0.1.19 version. Would someone be able to clarify?
Thanks a lot. I look forward to your reply.
————— Here are the commands I used to generate VCF files: ————————————————————
../samtools-0.1.19/samtools mpileup -uf ../human_g1k_v37.fasta HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage.20120522.bam HG 00268.chrom20.ILLUMINA.bwa.FIN.low_coverage.20130415.bam HG00419.chrom20.ILLUMINA.bwa.CHS.low_coverage.20130415.bam HG00759. chrom20.ILLUMINA.bwa.CDX.low_coverage.20130415.bam HG01112.chrom2 0.ILLUMINA.bwa.CLM.low_coverage.20120522.bam | ../samtools-0.1.19/bcftools/bcftools view -bvc - > 5genomes.var.raw.bcf
../samtools-0.1.19/bcftools/bcftools view 5genomes.var.raw.bcf | ../samtools-0.1.19/bcftools/vcfutils.pl varFilter -D100 > 5genomes.var.flt.vcf
Since you're using an old version of samtools you'll never see 'X' as a non-ref base.