Question: Analyzing genome wide sequencing data for control and experimental groups with biological replicates.
0
gravatar for sisiliuiuc
4.0 years ago by
sisiliuiuc0
United States
sisiliuiuc0 wrote:

Hello, I am very new to sequence data analysis and had some structural questions. I am trying to analyze the difference in the presence of an enriched mark between a control group and a experimental group. The animals model we use are mice. I have 3 biological replicates from each of the two groups and I am trying to find the different in the presence of this mark. I'm good up to the point of alignment but I'm stuck on how to peak call. I know people usually use an control/input sample for chip-seq where they don't do the IP and just sequence to account for the background noise but we didn't do a non IP control. Here's are my thoughts on how to approach this and the options I came across. Please let me know which one is the most reasonable approach, it would be wonderful if there are references to papers. 

option 1: Use the 3 control mice and randomly assign them to the 3 experimental treatment mice and peak call using Control mice as input. I would end up with 3 files of different peaks, then I would find the peaks present in more than 50% of the replicates 

option 2: Use the 3 control mice and match them with all the possibilities of the 3 experimental mice. I would end up with 9 files and then find the peaks present in more than 50% of the replicates

option 3: peak call all the samples individually without an input (MACS) allows this. and then find the peaks that are in common in more than 50% of the animals in each the control and the experimental groups. I would end up with two files of peaks, one for control and one for experimental treatment group. Then find the difference between the two files. 

Thank you, I realize this is a very long post. Thank you.

sequencing genome • 2.5k views
ADD COMMENTlink modified 4.0 years ago by Istvan Albert ♦♦ 77k • written 4.0 years ago by sisiliuiuc0

Are these broad peaks or narrow peaks (like histone modifications or transcription factors)?

ADD REPLYlink written 4.0 years ago by matted6.9k

They are 5hmC enriched regions with an average length of 1.5 kb.

ADD REPLYlink written 4.0 years ago by sisiliuiuc0
0
gravatar for Istvan Albert
4.0 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

I am not sure I got every detail right but it seems that since your are looking at the differences between samples and you already have a control and treatment you can use those. You would not need yet another control. 

ADD COMMENTlink written 4.0 years ago by Istvan Albert ♦♦ 77k

Usually there is a non IP sample used as Input to differentiate between the signal and noise. I'm using MACS to call peaks on galaxy tool, but it will let me call peaks without an input. However since I have biological controls, It might be fine to use the biological control as input and the biological treatment as treatment in MACS. My problem is since I have 3 biological replicates for each control and treatment group, I'm not sure if I should concatenate the replicates together and then peak call or peak call first and then find the common difference in all 3 pairs. 

ADD REPLYlink written 4.0 years ago by sisiliuiuc0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1944 users visited in the last hour