Question: Chip-seq common control for two treatments
0
gravatar for cg
4.5 years ago by
cg10
United States
cg10 wrote:

Hi.

I have a Chip-seq dataset described as follows.  Two replicates of control sequenced with two replicates each for two (T. factor) treatments. Therefore, a total of 6 fastq files resulting from the same lane.  

My question is, how do I normalize the data for a comparison of each treatment with control in a scenario where there are about 70 million and 8 million reads for rep1 and rep2 of first treatment and 2 million and 5 million reads for controls. I am not sure about the total number of reads in the second treatment. I have pasted the stats of bowtie2 output.

T1:

8077966 reads; of these:
  8077966 (100.00%) were unpaired; of these:
    2270927 (28.11%) aligned 0 times
    2605491 (32.25%) aligned exactly 1 time
    3201548 (39.63%) aligned >1 times
71.89% overall alignment rate

T2:

70910425 reads; of these:
  70910425 (100.00%) were unpaired; of these:
    32129717 (45.31%) aligned 0 times
    18056752 (25.46%) aligned exactly 1 time
    20723956 (29.23%) aligned >1 times
54.69% overall alignment rate

C1:

5435992 reads; of these:
  5435992 (100.00%) were unpaired; of these:
    1252404 (23.04%) aligned 0 times
    1898388 (34.92%) aligned exactly 1 time
    2285200 (42.04%) aligned >1 times
76.96% overall alignment rate

C2.

2755776 reads; of these:
  2755776 (100.00%) were unpaired; of these:
    2129160 (77.26%) aligned 0 times
    277810 (10.08%) aligned exactly 1 time
    348806 (12.66%) aligned >1 times
22.74% overall alignment rate  

Should I just go about merging sorted bam files of each replicate and use as MACS input? OR analyze each replicate individually? I did the later and the difference was about 50 peaks for one and 400 peaks for another. I am not sure if I should trust the analyses.

Other option is to normalize all three samples, C, T1, T2 together and maybe look for a coordinate regulation between T1 and T2 with respect to C.  But my main concern is normalization in such a manner that each treatment can be compared to control for direct targets.  

Thanks for suggestions and ideas. :) 

P.S.: I played no role in designing this experiment ;P . The biologists have no clue as to why they did this.

  

 

normalize chip-seq • 1.6k views
ADD COMMENTlink modified 4.5 years ago by Ying W3.9k • written 4.5 years ago by cg10

Try to look at your samples using CHANCE. It shows similarity between the replicates and QC in general.

ADD REPLYlink written 4.5 years ago by marina.v.yurieva480

If you have replicates, look at these two software:

https://github.com/troublezhang/PePr

http://mahonylab.org/software/multigps/

ADD REPLYlink written 4.5 years ago by Ming Tang2.5k
1
gravatar for Ying W
4.5 years ago by
Ying W3.9k
South San Francisco, CA
Ying W3.9k wrote:

Holy crap, you have some serious technical issues that you need to think about first.

T2 has almost 10x the amount of sequence as T1 and C2 has much fewer reads and very few aligned. Could you give some more info about your controls? We are talking about using the same Ab used for pulldown as T1/T2 but in different condition right? (not reverse crosslink / IgG control). Does the control condition have different amount TF accessible to Ab?

Personally, I would not use the chip-seq methods that take advantage of replicates for your experiment due to the huge difference in reads for T1/T2. What I would do is first figure out how similar T1 and T2 are (see suggestions from @Ian ). If T1 seems to be a subset of T2, I think you would just end up probably just using T2 since combing the two might actually cause my headaches downstream. I would also look into what C2 is hitting since most are not aligned, I would double check that chrM is used for aligning (if you are aligning against humans). I think there are two possibilities, a high amount of contaminants (you can figure this out by doing a blastn/blat against nr/nt) or you are getting a ton of duplicates (low library complexity due to very little DNA pulled down / lots of PCR amplification).

I think the kinda issues you have are more on the biology side and not something that can be fixed on the computational end. Keep in mind: "Garbage in, Garbage out". You wont want to spend a lot of your time trying to rescue data that ends up being unusable.

ADD COMMENTlink written 4.5 years ago by Ying W3.9k
0
gravatar for Ian
4.5 years ago by
Ian5.4k
University of Manchester, UK
Ian5.4k wrote:

Given how irreducible the replicates are combining them does not seem to be the best approach.  It might be worth using something like htseqtools to generate a PCA plot to see whether the replicates look similar or not.  Have the samples been adapter/quality trimmed?  What was the QC report like (e.g. fastQC)?  Handle these samples with care :)

ADD COMMENTlink written 4.5 years ago by Ian5.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour