Question: Generating Both Pileup And Vcf/Bcf Using Samtools Mpileup
7
gravatar for claresloggett
7.3 years ago by
claresloggett70 wrote:

I'm using samtools mpileup and would like to generate both a pileup file and a vcf file as output. I can see how to generate one or the other, but not both (unless I run mpileup twice). I suspect I am missing something simple.

Specifically, calling mpileup with the -g or -u flag causes it to compute genotype likelihoods and output a bcf. Leaving these flags off just gives a pileup. Is there any way to get both, without redoing the work of producing the pileup file? Can I get samtools to generate the bcf _from_ the pileup file in some way? Generating the bcf from the bam file, when I already have the pileup, seems wasteful.

Thanks for any help!

samtools • 23k views
ADD COMMENTlink modified 7.0 years ago by Joseph Hughes2.8k • written 7.3 years ago by claresloggett70
10
gravatar for Joseph Hughes
7.0 years ago by
Joseph Hughes2.8k
Scotland, UK
Joseph Hughes2.8k wrote:

I think this is what you are asking for:

samtools mpileup -E -uf reference.fa file.bam > file.mpileup

In this first step, you will get the mpileup file, then:

bcftools view -cg file.mpileup > file.vcf

This will give you the vcf file. If you want bcf rather than vcf:

 bcftools view -bcg file.mpileup > file.bcf
ADD COMMENTlink written 7.0 years ago by Joseph Hughes2.8k

Although the file is named as "file.mpileup", "samtools mpileup -u" actually generates a BCF file. You can generate a pileup file without using -g and -u, then try this tool  pileup_to_vcf to convert pileup file to vcf file.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Dejian1.3k

I have tried using both methods, I have genrated a file.pileup which has the 6 columns and looks correct.

when I use the bcftools I get the warning that the column !=5 warning.

The pileup_to_vcf has also not worked and seems to just stall and not run (checking ps regularly)

I have also used the samtools sam_2_vcf.pl script to no avail and this gives me an error saying that there is insufficent columns in the .pileup file (the opposite of bcftools).

I am really stuck here as I need to get the files back to vcf.

 

Any help would be greatly appreciated!

ADD REPLYlink written 5.9 years ago by Jonathan Crowther200
  1. There is a test mpileup file in pileup_to_vcf/test-data folder. Run "./pileup_to_vcf.py -i test-data/test.pileup -o test.out.vcf" and then you can get the vcf results (a file named test.out.vcf). It's the same when you use your own mpileup files.
  2. bcftools doesn't work on pileup format data. It works on bcf/vcf files.
  3. samtools provides a script called sam2vcf.pl, which works on the output of "samtools pileup". However, this command is deserted in newer versions. The output of "samtools mpileup" does not satisfy the requirement of sam2vcf.pl. You can check the required pileup format on lines 95-99, which is different from output of "samtools mpileup".
  4. If you want to get a vcf format, refer to examples on samtools reference pages.
ADD REPLYlink written 5.9 years ago by Dejian1.3k
1

where I can download pileup_to_vcf.py ? @Dejian

ADD REPLYlink written 3.5 years ago by winter_li60

what is reference.fa file? where can I download it?

ADD REPLYlink written 8 months ago by khatami.mahshid20

That is the reference genome that you aligned your data to.

ADD REPLYlink written 8 months ago by genomax84k

where can i download it? I want to map with human genome hg19?

ADD REPLYlink written 8 months ago by khatami.mahshid20

You can download genome reference and annotation files from GENCODE.

Illumina's iGenomes site also has ready to use sequence/index/annotation bundles for various genomes.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1080 users visited in the last hour