Question: Relationship of bedGraph to wig.
0
gravatar for ariel.balter
2.8 years ago by
ariel.balter140
ariel.balter140 wrote:

Bioinformatics neophyte here. I'm trying to view tracks from my ChIPSeq data. I followed a fairly standard pipeline process fastq-->align (bwa)-->bam-->sort (picard)-->call peaks (macs2)-->.broadPeak, .bdg & pileups (samtools mpileup). I also made filtered pileups for just under the peaks. My pileups have the format

<chromosome> <position> <counts>

I want to view the tracks around the peaks. I considered creating wig files from them by creating a new header each time I reach a new chromosome

variableStep chrom=<chromosome>
<position>        <counts>
<position+1>      <counts>
<position+2>      <counts>
...
...

I would have used wigToBigWig to make bigWig files. But then I realized that I can just use MACS2 to spit out bedGraph files .bdg. I did this, and they look like:

<chrom>            <start>    <stop>     <value> 
KL568395.1         0          8763       0.38251
KL568395.1         8763       8833       0.55459
KL568395.1         8833       9041       0.38251
KL568395.1         9041       9111       0.55459
KL568395.1         9111       9172       0.38251
KL568395.1         9172       9198       0.55459
KL568395.1         9198       9242       1.10918
...
...

I don't get how these are tracks of counts. Can someone please explain?

chip-seq wig wiggle bedgraph • 1.5k views
ADD COMMENTlink modified 2.8 years ago by Devon Ryan90k • written 2.8 years ago by ariel.balter140
0
gravatar for Devon Ryan
2.8 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

The values in a wiggle (".wig" extension, typically) file don't need to be integer counts, they can be anything. In fact, once you convert to bigWig format everything is a float (e.g., 1.0, 0.67, 22.5) anyway, since it doesn't store integers.

As an aside, you might find bamCoverage from deepTools useful. It'll directly make a bigWig file from a BAM file for you.

ADD COMMENTlink written 2.8 years ago by Devon Ryan90k

I'm ok with the float thing, as long as I know the scaling factor. But I don't get the part about not having a value at every location. Also, I looked at bamCoverage. Would I use the --binSize=1 setting to get a value at each position?

ADD REPLYlink written 2.8 years ago by ariel.balter140

Whether you get a value at each position depends on whether you store areas with values of 0 or not. The concept of computing coverage of bins (i.e., fixed-width intervals) of positions is due to that fact that storing the actual value at every position is often overkill. In the common case of looking at histone modifications, it really doesn't matter if you just chunk everything into 50 base or more blocks, peaks are vague noisy things anyway. The only time you actually need to use a bin size of 1 is when you're looking at something that actually has single-base precision (e.g., in analysing RiboSeq datasets, I use this to get exact positions of ribosomal pausing). This can also be useful if you're looking at transcription factor binding sites (or really anything with a focal source providing all of the signal).

ADD REPLYlink written 2.8 years ago by Devon Ryan90k

Thanks! We are looking for transcription factor binding sites. So I'll give a try with bin size 1.

ADD REPLYlink written 2.8 years ago by ariel.balter140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour