Question

Why Does The Homer Tool Find Tss Sites For So Many (41,478) Genes?

0

Entering edit mode

11.0 years ago

kanwarjag ★ 1.2k

I am using Hommer to make tag directory of chipseqdata then I used 20 bp bins to annotate tag densities for hg18 in a 10kb window across TSS. I found it has 41478 rows meaning unique ref seq genes However there are not so many refseq genes on different databases. Any suggestion how Hommer is having so many refseq genes with their TSS mapped? Thanks

• 2.7k views

ADD COMMENT • link updated 11.0 years ago by Ido Tamir 5.2k • written 11.0 years ago by kanwarjag ★ 1.2k

score 3 · Answer 1 · 2013-05-03

Because there are so many entries in refseq:

>mysql --user=genome --host=genome-mysql.cse.ucsc.edu -D hg18 -A
mysql> select count(*) from refGene;
+----------+
| count(*) |
+----------+
|    43165 |
+----------+
1 row in set (0.20 sec)

mysql> select count(distinct(name)) from refGene;
+-----------------------+
| count(distinct(name)) |
+-----------------------+
|                 41125 |
+-----------------------+
1 row in set (0.30 sec)

mysql> select count(distinct(name2)) from refGene;
+------------------------+
| count(distinct(name2)) |
+------------------------+
|                  23770 |
+------------------------+
1 row in set (0.28 sec)

And entries means transcript isoforms. And you want to look at each promoter from each transcript at each position. The second query shows you ~ 2000 transcripts map to multiple locations (but with identical refseq id). The third query tells you that refseq thinks these 40000 transcripts can be grouped to about 24000 "genes".