Get all RefSeq transcripts from hg18
3
0
Entering edit mode
9.8 years ago
lilla.davim ▴ 160

Hello,

Sorry for the newbie question, but I am having a hard time trying to figure out how to retrieve the set of all curated RefSeq transcripts (with accession prefix NM_) from hg18. Could you help me build the SQL query for this?

Thanks.

RefSeq RNA-Seq • 3.0k views
ADD COMMENT
2
Entering edit mode
9.8 years ago
poisonAlien ★ 3.2k
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -D hg18 -e 'SELECT chrom,txStart,txEnd,strand,cdsStart,cdsEnd,exonCount,name from refGene LIMIT 10'

+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+
| chrom | txStart   | txEnd     | strand | cdsStart  | cdsEnd    | exonCount | name      |
+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+
| chr20 |    764710 |    774922 | +      |    773447 |    774335 |         2 | NM_207121 |
| chr9  | 139406074 | 139437535 | -      | 139437535 | 139437535 |         3 | NR_104599 |
| chr21 |  42315181 |  42318098 | +      |  42318098 |  42318098 |         2 | NR_119385 |
| chr21 |  42302371 |  42318098 | +      |  42318098 |  42318098 |         2 | NR_119384 |
| chr17 |  45993427 |  46059831 | +      |  46059831 |  46059831 |        35 | NR_046057 |
| chr7  |  44120803 |  44129694 | -      |  44120908 |  44129683 |        11 | NM_006230 |
| chr5  | 134209268 | 134223324 | +      | 134218489 | 134219056 |         2 | NM_152409 |
| chr5  |  75005779 |  75044036 | -      |  75006015 |  75043965 |        11 | NM_152408 |
| chr12 |  54882554 |  54901982 | -      |  54886497 |  54890496 |         8 | NM_194358 |
| chrX  | 154140712 | 154147068 | -      | 154143281 | 154146767 |         2 | NM_171998 |
+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+

Check the table schema for refGene from ucsc table browser.

ADD COMMENT
0
Entering edit mode
9.8 years ago
Ann ★ 2.4k

Also, the UCSC table browser.

ADD COMMENT
0
Entering edit mode
9.8 years ago
lilla.davim ▴ 160

Hello,

Thanks. Actually I wanted to check the following statement from a recent paper:

The set of transcripts used in this experiment were the curated RefSeq transcripts (accession prefix NM) from hg18 (31,148 transcripts).

However I don't find the same number by querying RefGene from hg18:

mysql> select distinct count(*) as total from refGene where name like "NM%";
+-------+
| total |
+-------+
| 38938 |
+-------+
1 row in set (0,31 sec)

Is something wrong in my interpretation or in my query?

Thanks for your help.

ADD COMMENT
0
Entering edit mode

The NM IDs in RefSeq are not unique. One transcript can occur on several loci. Perhaps that might explain the difference.

ADD REPLY

Login before adding your answer.

Traffic: 1409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6