Question

Chip seq control library prep?

0

Entering edit mode

6.6 years ago

Boboboe ▴ 40

Hello everyone,

I recently started learning more about sequencing and I'm a little confused about data formats and also how you prepare your input. From my understanding, input is to let you know what the reads would be if you are not enriching for reads that are interacting with your target protein. So for each sample, you'll have a input that's made into a library. After a library is made and you have it sequenced, you get all your reads in .sra file, which then you can process with a chipseq pipeline (what I'm looking into using is the aquas pipeline:https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#)

I assume for other chip-seq pipeline you would also need the following: 2 replicates of reads from your experimental library and 2 replicates of reads from your control library

then I was looking at this data set: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59316 and was looking to convert it into the format that is used in roadmap epigenomics (what I thought of as a fold change signal track in a bed file). However, what is provided is what I believed to be 2 separate experimental library that is created using different antibodies, with no control. So here I am ,confused about not finding everything I need on the geo entry. Can someone please help me with processing the data, and explain to me if there's something I'm not understanding correctly?

Thanks everyone!

ChIP-Seq sequencing alignment AQUAS • 1.2k views

ADD COMMENT • link updated 6.6 years ago by Kevin Blighe 87k • written 6.6 years ago by Boboboe ▴ 40

score 0 · Answer 1 · 2017-09-14

Hey,

Firstly, you may want to take a look at this thread to see what's important to consider when processing ChIP-seq data: A: ChIP-Seq identification of peaks

I think that there are different ways of designing ChIP-seq experiments. In one that I did in the recent past, we had 2 test samples, 1 isotype control, and one empty vector. From the 2 test samples, we subtracted the peaks in the isotype control. The empty vector sample was essentially a negative control, i.e., to show no peaks. In another different design, we again had two test samples and two matched input controls, which were subtracted from the test samples.

Looking at the study that you've referenced (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59316), I can only imagine that the way to conduct it would be to process the data, identify peaks, and then perform the following comparisons:

Ab1 proliferating cells (PD32) - Ab1 replicative senescent IMR90 cells (PD86)
Ab2 proliferating cells (PD32) - Ab2 replicative senescent IMR90 cells (PD86)

it looks like they have not done any input controls or empty vectors.

Regarding the SRA files, they are and are not standardised. Data that's uploaded to the Sequence Research Archive is saved in SRA format, but you need to convert it to FASTA in order to start to process the data yourself. Please take a look here: https://www.ncbi.nlm.nih.gov/books/NBK158900/

Kevin