Question: Chipseq Wig File With The Input Subtracted From Chip Sample
6
gravatar for Shumon
6.7 years ago by
Shumon110
Shumon110 wrote:

Is there a way to make just one wig file with the input subtracted from ChIP sample? I mean I have a ChIPseq data read aligned BED files for both ChIP and Control. I would like to have one WIG file where input is subtracted already. Any one knows any simple tool/script/method. (except SPP)

thanks

wiggle chip-seq • 9.4k views
ADD COMMENTlink modified 5.4 years ago by Fidel1.9k • written 6.7 years ago by Shumon110

This can lead to inaccuracies if your chip sample and your input sample are of different read depths.

ADD REPLYlink written 6.7 years ago by Ying W3.9k

I doubt he means subtract the raw signals literally. Rather, if one chooses a normalization strategy, then converts to log2 scale, one could then subtract and the result would be a depth-adjusted ratio of signals.

ADD REPLYlink written 6.7 years ago by seidel6.8k
6
gravatar for seidel
6.7 years ago by
seidel6.8k
United States
seidel6.8k wrote:

There are many ways to do this, and the simplest I know of is to use R. Though it's not actually that simple because the R objects for handling this kind of data can be complex and obtuse. But once you invest in getting to know them, the pay off is great because great tools and functions are available for all kinds of manipulation. The R chipseq and tracklayer libraries load many functions that can help you. I would do the following. Read your bed file into R and create a GRanges object. Once you have this kind of object, you can extend your reads, and calculate coverage. If you do this for the IP and control, you'll then have two coverage objects you can manipulate to form a ratio, which can be converted to wig (or bigWig, or bigBedGraph, etc.). Assuming you read your bed file into a data frame called bed with fields for chromosome, start, stop, and strand, this could be as simple as:

library(chipseq)
library(rtracklayer)

# create GRanges object from chromosomal coordinates of reads
GR <- GRanges(seqnames=bed$chr, ranges=IRanges(start=bed$start, end=bed$end), strand=bed$strand)

# extend reads
GR <- resize(GR, ExtensionLength)

# get coverage
cov <- coverage(GR)

The coverage objects are RLE Lists, which are lists of Run Length Encoded vectors - one per chromosome. Thus if you are familiar with lists in R, you can loop through the list and deal with each chromosome. If you do this for both your IP and control simultaneously, you can assess average or mean coverage, normalize, take a ratio, etc. and assign the results to a new RLE list:

# create empty Rle List
rat <- RleList()

# loop through your IP and Control to create a ratio

for( i in 1:length(cov1) ){
  ip.cov <- cov1[[i]]
  wce.cov <- cov2[[i]]
  # normalize?
  # add one to all positions to avoid div/0
  ip.cov <- ip.cov + 1
  wce.cov <- wce.cov + 1
  rat[[i]] <- signif(log2(ip.cov/wce.cov),3)
}

# create bigWig file of ratio results
export.bw(rat, filename, "bedGraph", seqlengths(Scerevisiae))

In the code above "Scerevisiae" is used as an example of an object which has the chromosome lengths of your organism. When creating the ratio you might have to use some trick to avoid dividing by zero, such as adding 1 to all positions both the IP and the control. The code above is a little schematized in places because you should probably look up how to loop through a list. The fact that it's an RLE list usually doesn't matter (i.e. you can treat the components like regular vectors, and don't have to pay attention to the fact that they are RLE vectors). You can see how with a few simple lines of code, you can accomplish a lot. If you google chipseq, bioconductor, granges, you'll find lots of examples and information. There's probably a lot better way to do the above in R (i.e. some fancy way of reading bed files into granges objects or dealing with two RLE lists), but the scheme above is pretty simple.

ADD COMMENTlink written 6.7 years ago by seidel6.8k

@seidel: many many thanks for such elaborate answer. I have minimum knowledge in R. So, little more complete code would really help me to use it. Also, I really would like to normalize it per million reads as I have multiple samples and I need to compare. Also, I how to produce the Scerevisiae object from two column chromosome length file, so that after shifting there is no position beyond chromosome boundary. Thanks,

ADD REPLYlink written 6.7 years ago by Shumon110
5
gravatar for Fidel
5.4 years ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

DeepTools has a module for doing just that called bamCompare. The only limitation is that it requires as input BAM files instead of BED files.

bamCompare takes care of the normalization of the ChIP and input and it extend the reads to match the library fragment size or the paired-end size if available. Because it uses multiple processors it can run quite fast. Besides taking the difference, it can also take the ratio and the log2ratio.

ADD COMMENTlink written 5.4 years ago by Fidel1.9k

for ChIP-seq, can Coveragebam extend each read to 200 bp and then output the bigwig file?

Thanks.

ADD REPLYlink written 3.6 years ago by Ming Tang2.5k

Yes, that's de default behavior of bamCoverage. It requires as input parameter the extension length. If you have paired end reads then the reads are extended to match the fragment length. You may want to see this answer  related to paired-ends.

ADD REPLYlink written 3.6 years ago by Fidel1.9k

Thanks. I have 36bp single-end reads.  I just go through the -h for bamCoverage, and I saw I can specify --fragmentLength 200bp. I like deeptools very much and thanks for such detailed documentation. Good documentation is one of the reasons that I use this tool!

ADD REPLYlink written 3.6 years ago by Ming Tang2.5k

By the way, if extend to 200bp, does bamCoverage takes care of the reads that exceed the chromosome ends? by bedClip etc? Thanks!

ADD REPLYlink written 3.6 years ago by Ming Tang2.5k
1

yeah, you should not worry about that.

ADD REPLYlink written 3.6 years ago by Fidel1.9k
1
gravatar for Shumon
6.7 years ago by
Shumon110
Shumon110 wrote:

There is way to do that using one existing tool (java-genomics-toolkit: http://palpant.us/java-genomics-toolkit/ ) ........... I got following mail from author of the tool (Timothy Palpant): I am copying it thinking it might help others....

You should be able to do this calculation with the tools that are already available. If you have your ChIP-seq reads in the file chip.bed and your input reads in the file input.bed, here are the steps you would take. (I'm assuming that you are using these tools on the command-line, but all of this functionality is also available through the Galaxy interface).

  1. Use the ngs.BaseAlignCounts tool to create coverage wig files from each set of reads:

$ ./toolRunner.sh ngs.BaseAlignCounts -i chip.bed -x 200 -a hg19 -o chip-coverage.wig $ ./toolRunner.sh ngs.BaseAlignCounts -i input.bed -x 200 -a hg19 -o input-coverage.wig

In this example, I have used the hg19 assembly and artificially extended the reads 200bp from their 5' end based on the fragment length distribution after sonication. You can adjust it to match your experiment. There are many common assemblies built-in, but if you are in a non-standard organism, then you will need to provide a chromosome lengths file (see the website for more info).

  1. Normalize the coverage files to read depth:

$ ./toolRunner.sh wigmath.Scale -i chip-coverage.wig -o chip-coverage.scaled.wig $ ./toolRunner.sh wigmath.Scale -i input-coverage.wig -o input-coverage.scaled.wig

(You can optionally specify a scale factor with -m)

  1. Subtract the input coverage from the chip coverage:

$ ./toolRunner.sh wigmath.Subtract -m chip-coverage.scaled.wig -s input-coverage.scaled.wig -o normalized-chip.wig

Hopefully that is what you are looking for. Each of these commands have other options that you can explore by running them without any arguments, i.e.

$ ./toolRunner.sh wigmath.Subtract

-------------------------------

If out wig file is too large, then ways to handle:

  1. One option is to split chromosomes into individual files. You could do this manually with the output you already have, or I just added an option to split the output of the BaseAlignCounts tool into individual files for each chromosome (--split). Unfortunately, that makes them harder to work with downstream because you have to process and upload each individual chromosome file. But they are smaller to upload while retaining full resolution.

  2. You can do windowing with the wigmath.Downsample tool to convert any of the single base pair wig files into larger windows (e.g. span=50). This will dramatically decrease the size. For example, to decrease the size by ~50x:

./toolRunner wigmath.Downsample --input chip.wig --window 50 --metric mean --output chip.50bpwindows.wig

This is probably the best option for upload to UCSC, and should make it comparable with the MACS sizes. The normalized files will still probably be larger because you have to write decimal places after normalization.

  1. If your files are still too big, you can convert them to BigWig and host them on ftp on your computer so that you do not have to upload them. See the BigWig documentation at UCSC for more info.
ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Shumon110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1110 users visited in the last hour