Sam Tools Pileup Format
2
2
Entering edit mode
9.8 years ago

I have a question about the output of the routine "samtool mpileup":

"In the final result there are not the reference bases without coverage"

I'd like to get the entire reference genome in the final pileup file.

Best,

Vittorio Fortino

samtools mpileup format • 5.8k views
2
Entering edit mode
9.8 years ago
Ian 5.8k

You need to run mpileup piped to bcftools. The latter differs from the variant calling example of samtools, by missing out the 'v' flag.

samtools mpileup -uf /path_to/genome.fa sorted.bam | bcftools view -cgb - > sorted.cons.raw.bcf'


You can then use bcftools view to vcfutils to output sequence as fastq format, in this case explicitly using positions where there is at least 10 read coverage, but less than 500 reads (misses out regions that have too high coverage due to artefacts). I usually set -D to be 3-5x the median depth of coverage for the genome.

bcftools view sorted.cons.raw.bcf | vcfutils.pl vcf2fq -d10 -D500 > sorted.mpileup.filtered_d10_D500.fq'


You can then use Seqtk to convert the fastq sequence to fasta, while filtering positions based on a phred quality score; 40 (roughly equiv. to P0.0001) appears to be an excepted minimum. Positions below your threshold are masked in the output.

seqtk fq2fa sorted.mpileup.filtered_d10_D500.fq > sorted.mpileup.filtered_d10_D500.fa


I hope that is clear. This is how i did it last (about six months ago). I think seqtk is packaged with samtools now...

0
Entering edit mode

Good answer! Two small updates that might help others looking at this post:

This command :

 bcftools view -cg - > sorted.cons.raw.bcf'


should be

 bcftools view -cgb - > sorted.cons.raw.bcf'


Also, the current seqtk code syntax is

seqtk seq -a inFile.fq > outFile.fa

0
Entering edit mode

Thank you very much for highlighting my error. I have amended my post :)

1
Entering edit mode
9.8 years ago
Swbarnes2 ★ 1.5k

Instead of running a separate program to convert fastq to fasta, vcfutils.pl is easy enough to alter. It's a perl script, so it's a flat text file. Basically, you want the last few lines of vcf2fq to look like this:

  print "\>$chr\n"; &v2q_print_str($seq);
#  print "+\n"; &v2q_print_str(\$qual);


That'll drop the quality score, and the preceding "+", and put a > instead of a @.

0
Entering edit mode

Nice! This works perfectly. No need to convert the .fq file.