Generating Both Pileup And Vcf/Bcf Using Samtools Mpileup
1
7
Entering edit mode
9.6 years ago

I'm using samtools mpileup and would like to generate both a pileup file and a vcf file as output. I can see how to generate one or the other, but not both (unless I run mpileup twice). I suspect I am missing something simple.

Specifically, calling mpileup with the -g or -u flag causes it to compute genotype likelihoods and output a bcf. Leaving these flags off just gives a pileup. Is there any way to get both, without redoing the work of producing the pileup file? Can I get samtools to generate the bcf _from_ the pileup file in some way? Generating the bcf from the bam file, when I already have the pileup, seems wasteful.

Thanks for any help!

samtools • 28k views
10
Entering edit mode
9.3 years ago
Joseph Hughes ★ 3.0k

I think this is what you are asking for:

samtools mpileup -E -uf reference.fa file.bam > file.mpileup


In this first step, you will get the mpileup file, then:

bcftools view -cg file.mpileup > file.vcf


This will give you the vcf file. If you want bcf rather than vcf:

 bcftools view -bcg file.mpileup > file.bcf

0
Entering edit mode

Although the file is named as file.mpileup, samtools mpileup -u actually generates a BCF file. You can generate a pileup file without using -g and -u, then try this tool pileup_to_vcf to convert pileup file to vcf file.

0
Entering edit mode

I have tried using both methods, I have generated a file.pileup which has the 6 columns and looks correct.

when I use the bcftools I get the warning that the column !=5 warning.

The pileup_to_vcf has also not worked and seems to just stall and not run (checking ps regularly)

I have also used the samtools sam_2_vcf.pl script to no avail and this gives me an error saying that there is insufficient columns in the .pileup file (the opposite of bcftools).

I am really stuck here as I need to get the files back to vcf.

Any help would be greatly appreciated!

0
Entering edit mode
1. There is a test mpileup file in pileup_to_vcf/test-data folder. Run ./pileup_to_vcf.py -i test-data/test.pileup -o test.out.vcf and then you can get the vcf results (a file named test.out.vcf). It's the same when you use your own mpileup files.
2. bcftools doesn't work on pileup format data. It works on bcf/vcf files.
3. samtools provides a script called sam2vcf.pl, which works on the output of samtools pileup. However, this command is deserted in newer versions. The output of samtools mpileup does not satisfy the requirement of sam2vcf.pl. You can check the required pileup format on lines 95-99, which is different from output of samtools mpileup.
4. If you want to get a vcf format, refer to examples on samtools reference pages.
1
Entering edit mode

Where can I download pileup_to_vcf.py? @Dejian

0
Entering edit mode

0
Entering edit mode

That is the reference genome that you aligned your data to.

0
Entering edit mode

where can i download it? I want to map with human genome hg19?

0
Entering edit mode

Illumina's iGenomes site also has ready to use sequence/index/annotation bundles for various genomes.