How Can I Convert Bam/Sam To Wiggle
7
19
Entering edit mode
11.9 years ago
Stew ★ 1.4k

I would like to convert BAM files into wiggle files for viewing in IGB and UCSC. I am aware that I can open BAM files directly in IGB and IGV, but I would really like to make smaller wiggle files for quick viewing of the data, additionally lots of the people I do analysis for prefer to upload wiggles into UCSC, so even if this isn't the best method I need to do it.

I am currently using this one liner to generate a wiggle file, with a value roughly every 10bp and with a minimum depth of 3.

samtools pileup fileName.bam | perl -ne 'BEGIN{print "track type=wiggle_0 name=fileName description=fileName\n"};($c,$start, undef, $depth) = split; if ($c ne $lastC) { print "variableStep chrom=$c\n"; };$lastC=$c;next unless $. % 10 ==0;print "$start\t$depth\n" unless$depth<3;'  > fileName.wig


This is based on a solution I found here. But adapted to give smaller files and a header track line. It works, and generates an 11Mb wig file from a 728M bam file, but it takes it's time about it.

Does anyone have a more elegant and perhaps faster solution?

samtools perl conversion ucsc wiggle • 48k views
11
Entering edit mode
11.9 years ago

Here's a Python script which converts BAM files to wiggle, then BigWig format.

This uses pysam to get a pileup via samtools -- essentially the same approach as your one liner -- so the speed will be comparable. The script allows specification of specific chromosome regions so is useful for examining a region in depth without having to skip over any bases.

We upload BigWig files to a local Galaxy instance. It can serve out sections of the file on-demand, which gives you the advantage of wiggle display at UCSC without the upload times.

0
Entering edit mode

I am getting the following error If I use the bam_to_wiggle.py script

Traceback (most recent call last):
File "bam_to_wiggle.py", line 142, in <module>
main(*args, **kwargs)
File "bam_to_wiggle.py", line 68, in main
convert_to_bigwig(wig_file, chr_sizes, config, outfile)
File "bam_to_wiggle.py", line 115, in convert_to_bigwig
subprocess.check_call(cl)
File "/usr/local/lib/python2.7/subprocess.py", line 499, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/local/lib/python2.7/subprocess.py", line 486, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/local/lib/python2.7/subprocess.py", line 672, in __init__
File "/usr/local/lib/python2.7/subprocess.py", line 1201, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory


The command I am using is:

python bam_to_wiggle.py /data1/goutham/dDocentGATK/alignment/1_method/BAM/sort.sample_EC1.bam -o sort.sample_EC1


The file /usr/local/lib/python2.7/subprocess.py exists

1
Entering edit mode

You need to have the wigToBigWig executable available in your PATH. The script header has a bit more documentation: https://github.com/chapmanb/bcbb/blob/796cc2a1179df6f5030212fd5e7f36540351abdb/nextgen/scripts/bam_to_wiggle.py#L23 Hope this helps get it running.

0
Entering edit mode

9
Entering edit mode
10.3 years ago

For me wiggle representation is generally for the coverage, so I'll just compute the coverage first using bedtools utility as genomeCoverageBed –i file.bed -bg –g my.genome > sample.cov, where file.bed can be obtained using bamToBed utility bamToBed -i file.bam > file.bed and then would use bedGraphToBigWig utility for ucsc to convert the coverage file to bigwig, which is better than wig as you can place on the local server and pass the text file containing links of these bigwigs to collaborators etc. for viewing with UCSC and its safe as only you having rights can change the file content. Another option would be using pyicos convert tool as pyicos convert my_experiment.bam my_experiment.wig -f SAM -F variable_wig -O, this will give you wig directly.

Cheers

0
Entering edit mode

May I ask you a question about bedGraphToBigWig? Can you talk about how to set the parameter of blockSize and itemsPerSlot? Because the default 256 and 1024 can lose lots of reads, which value do you think is the point that can balance the size of coverage file and sensitivity of reads? Thank you!

0
Entering edit mode

I am not sure, I always use the defaults, may be you can test using different paramters to confirm which one suits you!!

0
Entering edit mode

Is the bigwig file a binary file? How can I look at it directly except by IGV?

0
Entering edit mode

Its an indexed binary format. You can put the file on a webserver and import it via link in IGV.

0
Entering edit mode

I got error

Traceback (most recent call last):
File "bam_wiggle.py", line 31, in <module>
ImportError: No module named bcbio.pipeline.config_utils


Is there anything else required other than pysam and wigToBigWig?

Thank you

0
Entering edit mode

I don't know why are you getting this error, could you elaborate more, what command you used?

3
Entering edit mode
11.9 years ago

I've been using Findpeaks which is actually a ChIP software, but there's an option (-dist_type 3) to make it only count reads...which is what you would want.

It can also break down the track on a per chromosome basis, or one track with all the chromosomes.

http://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=FP4Parameters

0
Entering edit mode

I have added this as the accepted answer as I was looking for something faster than my perl hack, and this seems to fit that bill. It is also part of my chIP seq pipleine already so I do not need to install any additional tools or dependencies.

2
Entering edit mode
11.9 years ago
Rm 8.2k

veggie's: bam2ucsc might be useful to visualize it in the ucsc browser.

Script can be found here.

2
Entering edit mode
8.5 years ago
cmdcolin ★ 2.3k

In recent versions, the samtools pileup command has been removed

[main] The 'pileup' command has been removed. Please use 'mpileup' instead


So, I found a solution that uses samtools depth instead

samtools depth fileName.bam | perl -ne 'BEGIN{ print "track type= print wiggle_0 \
name=fileName description=fileName\n"}; ($c,$start, $depth) = split; \ if ($c ne $lastC) { print "variableStep chrom=$c span=10\n"; };$lastC=$c; \
next unless $. % 10 ==0;print "$start\t$depth\n" unless$depth<3' > fileName.wig


It isn't necessarily faster or more elegant but it works with latest versions of samtools. Also, there's a similar thread here that talks about using samtools mpileup: convert bam to wig

Edit: If you are converting to bigwig and receive the error "chromosome xxx has nnnn bases but ends at nnnn", then it's probably because the added span parameter goes beyond the end of the genome. Therefore, use the -clip option when doing wigToBigWig

Edit 2016: I would use bedtools coverageBed and then bedGraphToBigWig instead of this method

1
Entering edit mode

Here's something interesting about this solution: samtools depth has an arbitrary cutoff of 8000! Anything above that is just discarded. I have no idea why was it made default, and still kind of cannot believe it. If you want your wig file to report true coverage from the BAM file, you would have to use '-d 0' key.

Edit 2016: I would use bedtools coverageBed and then bedGraphToBigWig instead of this method

Unfortunately, for some tools you need a properly formatted wig file, which bedGraphToBigWig - bigWigToWig combo does not generate. The solution above works quite well though.

0
Entering edit mode

This works great! What are the output values though for say ChIP-seq data thats been trimmed, bowtie2 aligned, duplicates removed and indexed? I know that the first column is the position, but do 5,7,12,and 15 represent raw read counts or rpkm values? Thanks! 10030 5 10040 7 10050 12 10060 15

0
Entering edit mode
10.7 years ago
wm ▴ 540

Is there anyway to make pileup include duplicates marked reads in the coverage?

0
Entering edit mode
7.4 years ago
catherine ▴ 210

The python script showed above doesn't work well for me. So I would recommend samtools mpileup as well.