Why Does The Homer Tool Find Tss Sites For So Many (41,478) Genes?
1
0
Entering edit mode
11.0 years ago
kanwarjag ★ 1.2k

I am using Hommer to make tag directory of chipseqdata then I used 20 bp bins to annotate tag densities for hg18 in a 10kb window across TSS. I found it has 41478 rows meaning unique ref seq genes However there are not so many refseq genes on different databases. Any suggestion how Hommer is having so many refseq genes with their TSS mapped? Thanks

• 2.7k views
ADD COMMENT
3
Entering edit mode
11.0 years ago
Ido Tamir 5.2k

Because there are so many entries in refseq:

>mysql --user=genome --host=genome-mysql.cse.ucsc.edu -D hg18 -A
mysql> select count(*) from refGene;
+----------+
| count(*) |
+----------+
|    43165 |
+----------+
1 row in set (0.20 sec)

mysql> select count(distinct(name)) from refGene;
+-----------------------+
| count(distinct(name)) |
+-----------------------+
|                 41125 |
+-----------------------+
1 row in set (0.30 sec)

mysql> select count(distinct(name2)) from refGene;
+------------------------+
| count(distinct(name2)) |
+------------------------+
|                  23770 |
+------------------------+
1 row in set (0.28 sec)

And entries means transcript isoforms. And you want to look at each promoter from each transcript at each position. The second query shows you ~ 2000 transcripts map to multiple locations (but with identical refseq id). The third query tells you that refseq thinks these 40000 transcripts can be grouped to about 24000 "genes".

ADD COMMENT

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6