Chipseq Wig File With The Input Subtracted From Chip Sample
3
6
Entering edit mode
12.1 years ago
Shumon ▴ 110

Is there a way to make just one wig file with the input subtracted from ChIP sample? I mean I have a ChIPseq data read aligned BED files for both ChIP and Control. I would like to have one WIG file where input is subtracted already. Any one knows any simple tool/script/method. (except SPP)

thanks

chip-seq wiggle • 14k views
ADD COMMENT
0
Entering edit mode

This can lead to inaccuracies if your chip sample and your input sample are of different read depths.

ADD REPLY
0
Entering edit mode

I doubt he means subtract the raw signals literally. Rather, if one chooses a normalization strategy, then converts to log2 scale, one could then subtract and the result would be a depth-adjusted ratio of signals.

ADD REPLY
6
Entering edit mode
12.1 years ago
seidel 11k

There are many ways to do this, and the simplest I know of is to use R. Though it's not actually that simple because the R objects for handling this kind of data can be complex and obtuse. But once you invest in getting to know them, the pay off is great because great tools and functions are available for all kinds of manipulation. The R chipseq and tracklayer libraries load many functions that can help you. I would do the following. Read your bed file into R and create a GRanges object. Once you have this kind of object, you can extend your reads, and calculate coverage. If you do this for the IP and control, you'll then have two coverage objects you can manipulate to form a ratio, which can be converted to wig (or bigWig, or bigBedGraph, etc.). Assuming you read your bed file into a data frame called bed with fields for chromosome, start, stop, and strand, this could be as simple as:

library(chipseq)
library(rtracklayer)

# create GRanges object from chromosomal coordinates of reads
GR <- GRanges(seqnames=bed$chr, ranges=IRanges(start=bed$start, end=bed$end), strand=bed$strand)

# extend reads
GR <- resize(GR, ExtensionLength)

# get coverage
cov <- coverage(GR)

The coverage objects are RLE Lists, which are lists of Run Length Encoded vectors - one per chromosome. Thus if you are familiar with lists in R, you can loop through the list and deal with each chromosome. If you do this for both your IP and control simultaneously, you can assess average or mean coverage, normalize, take a ratio, etc. and assign the results to a new RLE list:

# create empty Rle List
rat <- RleList()

# loop through your IP and Control to create a ratio

for( i in 1:length(cov1) ){
  ip.cov <- cov1[[i]]
  wce.cov <- cov2[[i]]
  # normalize?
  # add one to all positions to avoid div/0
  ip.cov <- ip.cov + 1
  wce.cov <- wce.cov + 1
  rat[[i]] <- signif(log2(ip.cov/wce.cov),3)
}

# create bigWig file of ratio results
export.bw(rat, filename, "bedGraph", seqlengths(Scerevisiae))

In the code above "Scerevisiae" is used as an example of an object which has the chromosome lengths of your organism. When creating the ratio you might have to use some trick to avoid dividing by zero, such as adding 1 to all positions both the IP and the control. The code above is a little schematized in places because you should probably look up how to loop through a list. The fact that it's an RLE list usually doesn't matter (i.e. you can treat the components like regular vectors, and don't have to pay attention to the fact that they are RLE vectors). You can see how with a few simple lines of code, you can accomplish a lot. If you google chipseq, bioconductor, granges, you'll find lots of examples and information. There's probably a lot better way to do the above in R (i.e. some fancy way of reading bed files into granges objects or dealing with two RLE lists), but the scheme above is pretty simple.

ADD COMMENT
0
Entering edit mode

@seidel: many many thanks for such elaborate answer. I have minimum knowledge in R. So, little more complete code would really help me to use it. Also, I really would like to normalize it per million reads as I have multiple samples and I need to compare. Also, I how to produce the Scerevisiae object from two column chromosome length file, so that after shifting there is no position beyond chromosome boundary. Thanks,

ADD REPLY
5
Entering edit mode
10.8 years ago
Fidel ★ 2.0k

DeepTools has a module for doing just that called bamCompare. The only limitation is that it requires as input BAM files instead of BED files.

bamCompare takes care of the normalization of the ChIP and input and it extend the reads to match the library fragment size or the paired-end size if available. Because it uses multiple processors it can run quite fast. Besides taking the difference, it can also take the ratio and the log2ratio.

ADD COMMENT
0
Entering edit mode

for ChIP-seq, can Coveragebam extend each read to 200 bp and then output the bigwig file?

Thanks.

ADD REPLY
0
Entering edit mode

Yes, that's de default behavior of bamCoverage. It requires as input parameter the extension length. If you have paired end reads then the reads are extended to match the fragment length. You may want to see this answer related to paired-ends.

ADD REPLY
0
Entering edit mode

Thanks. I have 36bp single-end reads. I just go through the -h for bamCoverage, and I saw I can specify --fragmentLength 200bp. I like deeptools very much and thanks for such detailed documentation. Good documentation is one of the reasons that I use this tool!

ADD REPLY
0
Entering edit mode

By the way, if extend to 200bp, does bamCoverage takes care of the reads that exceed the chromosome ends? by bedClip etc? Thanks!

ADD REPLY
1
Entering edit mode

yeah, you should not worry about that.

ADD REPLY
1
Entering edit mode
12.1 years ago
Shumon ▴ 110

There is way to do that using one existing tool (java-genomics-toolkit: http://palpant.us/java-genomics-toolkit/ ) ........... I got following mail from author of the tool (Timothy Palpant): I am copying it thinking it might help others....

You should be able to do this calculation with the tools that are already available. If you have your ChIP-seq reads in the file chip.bed and your input reads in the file input.bed, here are the steps you would take. (I'm assuming that you are using these tools on the command-line, but all of this functionality is also available through the Galaxy interface).

  1. Use the ngs.BaseAlignCounts tool to create coverage wig files from each set of reads:

$ ./toolRunner.sh ngs.BaseAlignCounts -i chip.bed -x 200 -a hg19 -o chip-coverage.wig $ ./toolRunner.sh ngs.BaseAlignCounts -i input.bed -x 200 -a hg19 -o input-coverage.wig

In this example, I have used the hg19 assembly and artificially extended the reads 200bp from their 5' end based on the fragment length distribution after sonication. You can adjust it to match your experiment. There are many common assemblies built-in, but if you are in a non-standard organism, then you will need to provide a chromosome lengths file (see the website for more info).

  1. Normalize the coverage files to read depth:

$ ./toolRunner.sh wigmath.Scale -i chip-coverage.wig -o chip-coverage.scaled.wig $ ./toolRunner.sh wigmath.Scale -i input-coverage.wig -o input-coverage.scaled.wig

(You can optionally specify a scale factor with -m)

  1. Subtract the input coverage from the chip coverage:

$ ./toolRunner.sh wigmath.Subtract -m chip-coverage.scaled.wig -s input-coverage.scaled.wig -o normalized-chip.wig

Hopefully that is what you are looking for. Each of these commands have other options that you can explore by running them without any arguments, i.e.

$ ./toolRunner.sh wigmath.Subtract

-------------------------------

If out wig file is too large, then ways to handle:

  1. One option is to split chromosomes into individual files. You could do this manually with the output you already have, or I just added an option to split the output of the BaseAlignCounts tool into individual files for each chromosome (--split). Unfortunately, that makes them harder to work with downstream because you have to process and upload each individual chromosome file. But they are smaller to upload while retaining full resolution.

  2. You can do windowing with the wigmath.Downsample tool to convert any of the single base pair wig files into larger windows (e.g. span=50). This will dramatically decrease the size. For example, to decrease the size by ~50x:

./toolRunner wigmath.Downsample --input chip.wig --window 50 --metric mean --output chip.50bpwindows.wig

This is probably the best option for upload to UCSC, and should make it comparable with the MACS sizes. The normalized files will still probably be larger because you have to write decimal places after normalization.

  1. If your files are still too big, you can convert them to BigWig and host them on ftp on your computer so that you do not have to upload them. See the BigWig documentation at UCSC for more info.
ADD COMMENT

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6