Question: Why Does The Homer Tool Find Tss Sites For So Many (41,478) Genes?
0
gravatar for kanwarjag
6.8 years ago by
kanwarjag1.0k
United States
kanwarjag1.0k wrote:

I am using Hommer to make tag directory of chipseqdata then I used 20 bp bins to annotate tag densities for hg18 in a 10kb window across TSS. I found it has 41478 rows meaning unique ref seq genes However there are not so many refseq genes on different databases. Any suggestion how Hommer is having so many refseq genes with their TSS mapped? Thanks

• 2.1k views
ADD COMMENTlink modified 6.8 years ago by Ido Tamir5.0k • written 6.8 years ago by kanwarjag1.0k
3
gravatar for Ido Tamir
6.8 years ago by
Ido Tamir5.0k
Austria
Ido Tamir5.0k wrote:

Because there are so many entries in refseq:

>mysql --user=genome --host=genome-mysql.cse.ucsc.edu -D hg18 -A
mysql> select count(*) from refGene;
+----------+
| count(*) |
+----------+
|    43165 |
+----------+
1 row in set (0.20 sec)

mysql> select count(distinct(name)) from refGene;
+-----------------------+
| count(distinct(name)) |
+-----------------------+
|                 41125 |
+-----------------------+
1 row in set (0.30 sec)

mysql> select count(distinct(name2)) from refGene;
+------------------------+
| count(distinct(name2)) |
+------------------------+
|                  23770 |
+------------------------+
1 row in set (0.28 sec)

And entries means transcript isoforms. And you want to look at each promoter from each transcript at each position. The second query shows you ~ 2000 transcripts map to multiple locations (but with identical refseq id). The third query tells you that refseq thinks these 40000 transcripts can be grouped to about 24000 "genes".

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Ido Tamir5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour