I've been working on a project the past few weeks where I'm analyzing SAM files for specific point mutations. I'm aware that bcftools has the commands mpileup and call that are meant to locate those mutations and return them as a vcf file. However, whenever I run my commands through the terminal, the output is always a vcf with only headers, as seen below.
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.19+htslib-1.19
##bcftoolsCommand=mpileup -A -Ou -o SRR23199821raw.vcf -f refgenome/ncbi_dataset/data/GCA_000001405.29/GCA_000001405.29_GRCh38.p14_genomic.fna vcfs/SRR23199821sorted.bam
##reference=file://refgenome/ncbi_dataset/data/GCA_000001405.29/GCA_000001405.29_GRCh38.p14_genomic.fna
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">
##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT vcfs/SRR23199821sorted.bam
There are column heads along the bottom row to display data, but there's nothing in there to read or call
Here are the commands I've been using
samtools view -S -b vcfs/SRR23199821.sam > vcfs/SRR23199821.bam
samtools sort -o vcfs/SRR23199821sorted.bam vcfs/SR23199821.bam
bcftools mpileup -O b -o vcfs/SRR23199821raw.bcf -f vcfs/refgenes/ref.fasta --threads 8 -q 20 -Q 30 vcfs/SRR23199821ssorted.bam
bcftools call -m -v -o vcfs/SRR23199821calls.vcf vcfs/SRR23199821raw.
Both of the samtools commands work fine and do their proper conversions, but the bcftools commands generate blank vcf files every time, and I can't figure out why
make sure the reference sequences match, also run the command with no filters first (-q and -Q) just to make sure you are not filtering out all alignemens and bases
Thank you. I tried running the command without either -q command (and without --threads for good measure) and still nothing meaningful got written into the file
the way
mpileup
was intended to work is to compute a genotype likelhood for every base! It will produce a line for every base that has coverage in the alignment file.in general you would never save that file, it is a huge amount of information that should not be saved as most of it is useless information, you should use uncompressed format as output and pipe it right into
bcftools call
.that being said if the
mpileup
does not produce output then either thebam
file has no alignments or your reference file is not quite correct or does not match the bam or some sort of fundamental error is at play. Sometimes the simplest mistakes are the hardest to troubleshoot.come to think of it what does
samtools mpileup
produce?