unique list of promoters
1
0
Entering edit mode
8.9 years ago
R ▴ 10

I have a very basic question. I would like to have unique list of promoters.

Let's say we have Refseq genes downloaded from Table browser (~54k).

If we extend the TSS with whatever kb up and down, how should we make the list unique? gene name or position?

e.g.

chr1    6052357 6161253 NM_001199861    KCNAB2  +
chr1    6086072 6161253 NM_003636       KCNAB2  +
chr1    6094347 6161253 NM_001199860    KCNAB2  +
chr1    6105980 6161253 NM_001199862    KCNAB2  +
chr1    6106173 6161253 NM_001199863    KCNAB2  +

if I unique them by $5 (OFFICIAL name), I will end up with ~26k, but by chr,start,end,strand I end up with ~36K

unique by either end or start could be also one option, but sometimes start is the same sometimes end!!

I prefer to unique by OFFICIAL name.

I would like to know you suggestions.

Thanks

ChIP-Seq gene • 2.5k views
ADD COMMENT
0
Entering edit mode

Have a look at the list of promoters defined by the FANTOM project (“CAGE peaks”, http://fantom.gsc.riken.jp/5/data/). Each of them has a unique name, indicating if they belong to a know gene, and if yes, their rank in expression level.

ADD REPLY
0
Entering edit mode
8.9 years ago
Ram 43k

A couple of thoughts here:

If you wish for unique names, go for unique names. If not, please explain what you're looking for.

Promoter sequences can often be flexible at certain base positions. They would then retain the same name, but match to different sequences. E-boxes, for example, can be as nonspecific as CANNTG. These could bind to 16 different sequences, but the promoter motif sequence itself could be identified by a single official name.

ADD COMMENT
0
Entering edit mode

Currently, for some types of analysis like histone mark enrichment analysis , ...

ADD REPLY

Login before adding your answer.

Traffic: 2072 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6