UCSC custom track - wiggle: (1) fixedStep vs. variableStep; (2) a general question
1
0
Entering edit mode
9.5 years ago
nonish5 ▴ 40

Hi,

Question 1:

I was wondering regarding efficiency of fixedStep vs. variableStep in the case of sparse data.

I have a program that generates a wiggle file of scores that I compute for each position in a gene. i.e., I have a vector which is of the length of the gene - some are zeros and some are non zeros.

I don't need to present the non zeros, so it seems to me more efficient to use the variableStep wiggle and to print to the text file only the non zeros positions, e.g.

variableStep chrom=chr19 span=1
 13411      1.2
 13412       7.5
 13416      3.4
 13417       11.12

I was wondering whether it would be more efficient to print the entire vector (including zeros) and to use the fixedStep. i.e. for the example above:

fixedStep chrom=chr19 span=1
 13411      1.2
 13412       7.5
 13413       0    
 13414       0  
 13415       0            
 13416       3.4
 13417       11.12

A few notes:

  1. For one gene example that I checked, out of 13,600 entries, only 1100 were non zeros.
  2. I should carry out this process for each gene in the human genome, and for each gene there will be 4 such vectors.
  3. I'm using the wigToBigWig after generating the wiggle text file.
  4. The reason I was wondering regarding the issue is due to the following paragraph from the wiggle documentation page:

Caution for sparse variableStep data

The wiggle format was designed for quickly displaying data that is quite dense. The variableStep format, in particular, becomes very inefficient when there are only a few data points per 1,024 bases. If variableStep data points (i.e., chromStarts) are greater than about 100 bases apart, it is advisable to use BedGraph format.

(but I can't use the BedGraph, I must use wiggle).

Question 2:

When writing a wiggle file (either variableStep/fixedStep) should the data of the same chromosome come in sequence? does it matter, efficiency-wise if we write

variableStep chrom=**chr19 **span=1
 13411      1.2
 13412       7.5
 13416      3.4
 13417       11.12

variableStep chrom=chrY span=1
pos1    score1
pos2    score2
...

variableStep chrom=**chr19** span=1
 34567      1.2
 34580       6.5
 34597      13.4

instead of:

variableStep chrom=**chr19 **span=1
 13411      1.2
 13412       7.5
 13416      3.4
 13417       11.12
 34567      1.2
 34580       6.5
 34597      13.4

variableStep chrom=chrY span=1
pos1    score1
pos2    score2
...

I'd appreciate any guidance!

genome gene • 2.7k views
ADD COMMENT
1
Entering edit mode

I suspect that since you are converting to bigWig whatever advice you read on text wiggle files will not apply to binary wiggle files.

It is quite even possible that both fixed and variable wiggle entries get written to the same format in the bigwig format.

ADD REPLY
1
Entering edit mode
9.5 years ago

Generally, be careful about what you throw away: For some signals, the presence of a zero-score position element is meaningful, when compared to lack of signal (i.e. no signal for a position).

ADD COMMENT

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6