Question: Sam Tools Pileup Format
2
gravatar for Vittorio Fortino
5.6 years ago by
Vittorio Fortino20 wrote:

I have a question about the output of the routine "samtool mpileup":

"In the final result there are not the reference bases without coverage"

I'd like to get the entire reference genome in the final pileup file.

Thanks in advance for the answer!

Best,

Vittorio Fortino

format samtools mpileup • 3.6k views
ADD COMMENTlink modified 9 months ago by Biostar ♦♦ 20 • written 5.6 years ago by Vittorio Fortino20
2
gravatar for Ian
5.6 years ago by
Ian4.9k
University of Manchester, UK
Ian4.9k wrote:

You need to run mpileup piped to bcftools. The latter differs from the variant calling example of samtools, by missing out the 'v' flag.

samtools mpileup -uf /path_to/genome.fa sorted.bam | bcftools view -cgb - > sorted.cons.raw.bcf'

You can then use bcftools view to vcfutils to output sequence as fastq format, in this case explicitly using positions where there is at least 10 read coverage, but less than 500 reads (misses out regions that have too high coverage due to artefacts). I usually set -D to be 3-5x the median depth of coverage for the genome.

bcftools view sorted.cons.raw.bcf | vcfutils.pl vcf2fq -d10 -D500 > sorted.mpileup.filtered_d10_D500.fq'

You can then use Seqtk to convert the fastq sequence to fasta, while filtering positions based on a phred quality score; 40 (roughly equiv. to P0.0001) appears to be an excepted minimum. Positions below your threshold are masked in the output.

seqtk fq2fa sorted.mpileup.filtered_d10_D500.fq > sorted.mpileup.filtered_d10_D500.fa

I hope that is clear. This is how i did it last (about six months ago). I think seqtk is packaged with samtools now...

ADD COMMENTlink modified 4.4 years ago • written 5.6 years ago by Ian4.9k

Good answer! Two small updates that might help others looking at this post:

This command :

 bcftools view -cg - > sorted.cons.raw.bcf'

should be

 bcftools view -cgb - > sorted.cons.raw.bcf'

Also, the current seqtk code syntax is

seqtk seq -a inFile.fq > outFile.fa
ADD REPLYlink written 4.4 years ago by AGS230

Thank you very much for highlighting my error. I have amended my post :)

ADD REPLYlink written 4.4 years ago by Ian4.9k
1
gravatar for Swbarnes2
5.6 years ago by
Swbarnes21.4k
Swbarnes21.4k wrote:

Instead of running a separate program to convert fastq to fasta, vcfutils.pl is easy enough to alter. It's a perl script, so it's a flat text file. Basically, you want the last few lines of vcf2fq to look like this:

  print "\>$chr\n"; &v2q_print_str($seq);
#  print "+\n"; &v2q_print_str($qual);

That'll drop the quality score, and the preceding "+", and put a > instead of a @.

ADD COMMENTlink written 5.6 years ago by Swbarnes21.4k

Nice! This works perfectly. No need to convert the .fq file.

ADD REPLYlink written 4.7 years ago by apightling0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1392 users visited in the last hour