Merge two multi-chromosome WIG files
2
0
Entering edit mode
5 months ago
kmyers2 ▴ 60

I have two WIG files representing the forward and reverse strands for an RNA-seq experiment that I would like to merge into one file. There are multiple chromosomes/plasmids in the WIGs. This is the general format:

track type=wiggle_0
variableStep chrom=ZM4
166 0.0337715701129034
167 0.0337715701129034
168 0.0337715701129034
195 0.0337715701129034
200 0.0337715701129034
217 0.0337715701129034
variableStep chrom=pZM32
64  0.0337715701129034
76  0.0337715701129034
134 0.0337715701129034
139 0.0337715701129034
183 0.0675431402258068
186 0.0337715701129034
variableStep chrom=pZM32
64  0.0337715701129034
76  0.0337715701129034
134 0.0337715701129034
139 0.0337715701129034
183 0.0675431402258068


There is no overlap between the genome coordinates in the two WIG files. I would like to merge them into one WIG file. So far I have tried using wiggletools:

wiggletools write test_out.wig sum file1.wig file2.wig


That combines the files well, but reports the format as follows:

fixedStep chrom=ZM4
start=165 step=1
-0.033772
0.033772
0.033772
0.033772
fixedStep chrom=ZM4
start=195 step=1
0.033772
fixedStep chrom=ZM4
start=200 step=1
0.033772
fixedStep chrom=ZM4
start=202 step=1


What I am looking for is standard WIG format (genomic position \t value) for the two files combined while keeping the chromosomes/plasmids organized.

I found this (How To Combine Multiple Wig/Bigwig Files Into One) but it assumes separate WIG files for each chromosome/plasmid.

Any tools or ideas? I will work on my own script but wanted to ask in case someone else has solved this problem.

wiggle bigwig wig • 463 views
2
Entering edit mode
5 months ago

Solving the generic case, take the union of n Wiggle files with bedops --everything and wig2bed:

bedops --everything <(wig2bed < file1.wig) <(wig2bed < file2.wig) ... <(wig2bed < fileN.wig) > union.bed


Map the summed signal from the union set over the unique position set, using bedmap --sum and bedops --partition. This results in a four-column bedGraph file (not BED):

bedops --partition union.bed | bedmap --echo --sum --delim '\t' - union.bed > answer.bg


The aggregation done on scores from overlapping positions is --sum. You could replace this with --mean or --max or --min or any other aggregation function to handle scores from overlapping regions in any way you see fit. See bedmap --help for a full listing, or search your favorite engine on bedmap for the online documentation.

However, if you are absolutely guaranteed that every genomic range in your starting set of n Wiggle files is unique and disjoint, then you can skip the partition and aggregation operations and just write out a bedGraph file directly:

bedops --everything <(wig2bed < file1.wig) <(wig2bed < file2.wig) ... <(wig2bed < fileN.wig) | cut -f1-3,5 > answer.bg


Be careful with that assumption. It will likely be safer to pick some reasonable aggregation function.

Once you have a bedGraph file, convert the bedGraph file to bigWig. Here we are assuming the reference genome is hg38, and we are using UCSC Kent utilities fetchChromSizes and bedGraphToBigWig to get the chromosome sizes we need and to do the conversion:

fetchChromSizes hg38 > hg38.chromSizes


Change this as needed for your reference genome.

You can subsequently extract bigWig to text-based Wiggle with bigWigToWig, but it is often more convenient to work with bigWig.

1
Entering edit mode

Thank you! I saw this after I got it working with the solution I added below, but this is super great!

0
Entering edit mode
5 months ago
kmyers2 ▴ 60

Convert the WIG file to a BigWig file:

wigToBigWig file1.wig chrom.sizes file1.bw


Merge the BigWig files into a BedGraph:

bigWigMerge file1.bw file2.bw combined.bedGraph


Convert BedGraph to BigWig:

bedGraphToBigWig combined.bedGraph chrom.sizes combined.bw


Convert BigWig back to Wig:

bigWigToWig combined.bw combined.wig