Question: what are short reads in Chip Seq and how come there are so many?
gravatar for Affan
4.9 years ago by
Affan290 wrote:

Hi all,

This may be a very basic question but I seem to be having a lot of trouble wrapping my head around this. I've begun my research in bioinformatics (Chip Seq, TF stuff) and am reading material to understand what Chip Seq is before I look at the computational part of it. (For what its worth, I am an Applied Math Masters candidate).

My question can easily be explained with one picture. I am really just stuck at step 1.

Okay my question is how come there are so many overlapped reads? Suppose I have a DNA sequence and if I shear it into fragments then how can they possibly overlap?

My "understanding" is that:

1) They take a DNA sequence and crosslink the protien of interest. Then they get rid of the DNA sequence surrounding this area of interest so now we have a "small" sequence of DNA and in this small DNA is somewhere where our TF binds. Now we make copies of this small DNA seq and run it through a sequencing machine. Is this correct?

2) They take a bunch of cells = bunch of DNA sequences. Then they do the same procedure above (by crosslinking and getting a "smaller" DNA sequence of interest). Since they had many cells to begin with, this means they had many DNA seq to begin with. Now they shear the DNA seq and we have fragments. Then we align these up with the reference genome.

Am I almost getting there in my understanding?

A secondary question is what is the significance between the red/blue alignments?

chip-seq • 2.5k views
ADD COMMENTlink modified 4.9 years ago by Istvan Albert ♦♦ 79k • written 4.9 years ago by Affan290
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You're quite close, you just have to remember that (1) and (2) are both done together, so you will start with a LOT of cells in any case and then perform the cross-linking/fragmentation/purification/reverse-crosslinking/etc.. Yes, there is often an additional amplification step prior to sequencing.

I would guess that the red/blue coloring is for read#1 and read#2 in a pair (or it's from a stranded/directional experiment and denotes the orientation of each read). In either case you would expect two peaks and that your protein(s) of interest bind somewhere between them.

ADD COMMENTlink written 4.9 years ago by Devon Ryan88k

Technically the expectation is that the fragment represents the actual bound DNA. 

It is true that the method/library preparation used be so inaccurate that the fragment was often much larger than the footprint of the protein. But this keeps improving.


ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 79k

Yeah, some of the newer methods give very exact locations, but this is completely method dependent.

ADD REPLYlink written 4.9 years ago by Devon Ryan88k
gravatar for Istvan Albert
4.9 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

One important detail to remember for all sequencing experiments is that the sequencing always proceeds from the 5' to 3' location. That is left-to-right on the forward strand and right-to-left on the reverse strand.

Moreover depending on the lenght of the fragment and length of reads this means that you can end up with partially overlapping read with various configurations. You can have reads that are shorter than the fragment, or longer, almost never exactly the same lenght. Then there will be all kinds of partial overlaps.

These configuration below could be observed when the fragment is more than twice as long as the read, less then twice as long but still more than read lenght, and less then a readlenght.

----------->         <-----------



ChIP-seq is best when visalized just as borders 5' end. The actual overlap and read coverage can be quite misleading.


ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1933 users visited in the last hour