Is it possible that the control data will be bigger than the treatment samples' data ?
1
0
Entering edit mode
5.6 years ago
jimmy_zeng ▴ 90

I'd like to study processing CHIPseq data by myself, I choose a paper to minic their data analysis steps.The details for these data : cell line: MCF7 // Illumina HiSeq 2000 // 50bp // Single ends // phred+33

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52964

I've check the metadata one by one, It really makes me confused.

> GSM1278641    Xu_MUT_rep1_BAF155_MUT  SRR1042593
> GSM1278642    Xu_MUT_rep1_Input   SRR1042594
> GSM1278643    Xu_MUT_rep2_BAF155_MUT  SRR1042595
> GSM1278644    Xu_MUT_rep2_Input   SRR1042596
> GSM1278645    Xu_WT_rep1_BAF155   SRR1042597
> GSM1278646    Xu_WT_rep1_Input    SRR1042598
> GSM1278647    Xu_WT_rep2_BAF155   SRR1042599
> GSM1278648    Xu_WT_rep2_Input    SRR1042600


# Is it possible that the control data will be bigger than the treatment samples' data ?

621M Jun 27 14:03 SRR1042593.sra (16.9M reads)
2.2G Jun 27 15:58 SRR1042594.sra (60.6M reads)
541M Jun 27 16:26 SRR1042595.sra (14.6M reads)
2.4G Jun 27 18:24 SRR1042596.sra (65.9M reads)
814M Jun 27 18:59 SRR1042597.sra (22.2M reads)
2.1G Jun 27 20:30 SRR1042598.sra (58.1M reads)
883M Jun 27 21:08 SRR1042599.sra (24.0M reads)
2.8G Jun 28 11:53 SRR1042600.sra (76.4M reads)

ChIP-Seq MASC HOMER • 1.3k views
1
Entering edit mode

Wow, that's a huge font. I have no idea what you want to do, but it's still quite impressive.

It might be useful if you could tell us some things like what your experiment entails, how the libraries were created, where your data comes from, and so forth. Please be as complete as possible if you want useful responses. It's certainly possible for random people on the internet to research the sra files in your post, but I think most people are not interested in doing that. So if you want answers, make things as easy as possible for your potential answerers. It looks like you are trying to redo an existing analysis, but unless you state exactly what you are trying to accomplish, it's difficult to give advice.

0
Entering edit mode

Oh, sorry about that. In fact, I give the link to NCBI, the guys interested will look through the details at the GEO page.

So I didn't duplicate the description about the experiment.

It's just a CHIP-seq data analysis question .

BAF155, as a important TF, which can be methylated by CRAM1 gene.

So, they do 2 type of CHIP-seq experiments

One to check where the BAF155 will impact the genome .

The other will check how will the function of BAF155 change if BAF155 can't be methylated for mutation .

1
Entering edit mode
5.6 years ago

Short answer: yes it's absolutely possible that the control samples really are sequenced significantly deeper than the ChIP samples. The degree of difference in that dataset is a bit...extreme, but perhaps they had other reasons (e.g., they were planning to use the same controls for more deeply sequenced samples, or they expected high ChIP efficiency with few actual binding events). It's best not to skimp on the control sample depth, normally doing at least as much control as ChIP. Why? Because once you do peak calling you need to scale your samples, and in order to not scale noise you scale the more deeply sequenced sample down. If you have to scale down your input then it'll still be a bit better for determining things like local variance levels than the other way around (I'm thinking in terms of peak calling with MACS2, so YMMV with other methods).

0
Entering edit mode

yes, I'd like to use MACS2, but the results is not as better as I expected.

For the code below, I got less than 100 peaks, that I don't know what's wrong with the parameters.

nohup time ~/.local/bin/macs2 callpeak -c SRR1042594.sorted.bam -t SRR1042593.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep1 2>Xu_MUT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042596.sorted.bam -t SRR1042595.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep2 2>Xu_MUT_rep2.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042598.sorted.bam -t SRR1042597.sorted.bam -f BAM -B -g hs -n Xu_WT_rep1  2>Xu_WT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -c SRR1042600.sorted.bam -t SRR1042599.sorted.bam -f BAM -B -g hs -n Xu_WT_rep2  2>Xu_WT_rep2.masc2.log &


Then I changed the control and treatment becasue I guess they mislabel the data uploaded. But I got nothing !!!!

nohup time ~/.local/bin/macs2 callpeak -t SRR1042594.sorted.bam -c SRR1042593.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep1 2>Xu_MUT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -t SRR1042596.sorted.bam -c SRR1042595.sorted.bam -f BAM -B -g hs -n Xu_MUT_rep2 2>Xu_MUT_rep2.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -t SRR1042598.sorted.bam -c SRR1042597.sorted.bam -f BAM -B -g hs -n Xu_WT_rep1  2>Xu_WT_rep1.masc2.log &
nohup time ~/.local/bin/macs2 callpeak -t SRR1042600.sorted.bam -c SRR1042599.sorted.bam -f BAM -B -g hs -n Xu_WT_rep2  2>Xu_WT_rep2.masc2.log &

0
Entering edit mode

You'd need to look through the data in IGV (always do this). It's quite likely that you'll need to play around with some of the parameters.

0
Entering edit mode

I will try IGV, but actually I am not familar with how to recognise the better parameters or figures which makes me helpness.