Hello everyone,
I recently started learning more about sequencing and I'm a little confused about data formats and also how you prepare your input. From my understanding, input is to let you know what the reads would be if you are not enriching for reads that are interacting with your target protein. So for each sample, you'll have a input that's made into a library. After a library is made and you have it sequenced, you get all your reads in .sra file, which then you can process with a chipseq pipeline (what I'm looking into using is the aquas pipeline:https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#)
I assume for other chip-seq pipeline you would also need the following: 2 replicates of reads from your experimental library and 2 replicates of reads from your control library
then I was looking at this data set: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59316 and was looking to convert it into the format that is used in roadmap epigenomics (what I thought of as a fold change signal track in a bed file). However, what is provided is what I believed to be 2 separate experimental library that is created using different antibodies, with no control. So here I am ,confused about not finding everything I need on the geo entry. Can someone please help me with processing the data, and explain to me if there's something I'm not understanding correctly?
Thanks everyone!