Question

Chip/Rna/Dnase/Etc...-Seq: Search For Similar Tracks

0

Entering edit mode

11.2 years ago

9606 ▴ 330

Hi all, I have a curiosity.

Suppose you have a track, let's say generated from a ChiP-seq experiment. Do you think that it could be usefull to search, among a set of tracks (all generated by ChiP-seq experiements), for some tracks that have a similar shape to the given one? And is so, could you please list some very simple biological questions that could be answered ny such "similarity searching" as they come to your mind?

Thank you, best regards.

ngs similarity • 2.2k views

ADD COMMENT • link 11.2 years ago by 9606 ▴ 330

0

Entering edit mode

your question is overly generic "similar tracks" could mean just about anything "listing some application" is also too broad as a term

ADD REPLY • link 11.2 years ago by Istvan Albert 100k

0

Entering edit mode

Dear Istvan, I edited my question. Do you think that now it is better?

ADD REPLY • link 11.2 years ago by 9606 ▴ 330

score 2 · Answer 1 · 2013-02-15

Thats why we call peaks (significant binding sites) and overlap them for the different proteins/TF's of interest, to check the similarity.

You can subset them for a specific region of interest or whole genome. Tools like Bedtools Compare Multiple Bed Files? , Macs and in general, R/Bioconductor are helpful to move forward to specifically what you want.

You can also try few papers in regard :

A co-localization model of paired ChIP-seq data using a large ENCODE data set enables comparison of multiple samples

An effective statistical evaluation of ChIPseq dataset similarity

Edit: Now you have edited the question, so

If you think the proteins are a part or the potential subunits of the same complex, their binding sites might mostly overlap.
Regarding shape, you cannot always be completely sure, because of the noise in data, but you can infer cases like, proteinA has normal distribution but proteinB is giving a normal and a small right skewed distribution. It might mean, its extends to genebody (if looking at TSS + depending on the genomic locus you take into account as reference).
If you make composite profile (abundance profile) of a protein, it can tell you where its mostly abundant (TSS, Genebody, TTS etc.)
For other deep analysis, you have to play with data more deeply apart from the visual analysis of the peak shapes.

Cheers