Question

Get all RefSeq transcripts from hg18

0

Entering edit mode

10.4 years ago

lilla.davim ▴ 160

Hello,

Sorry for the newbie question, but I am having a hard time trying to figure out how to retrieve the set of all curated RefSeq transcripts (with accession prefix NM_) from hg18. Could you help me build the SQL query for this?

Thanks.

RefSeq RNA-Seq • 3.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by lilla.davim ▴ 160

Ram · Answer 1 · 2014-06-21

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -D hg18 -e 'SELECT chrom,txStart,txEnd,strand,cdsStart,cdsEnd,exonCount,name from refGene LIMIT 10'

+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+
| chrom | txStart   | txEnd     | strand | cdsStart  | cdsEnd    | exonCount | name      |
+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+
| chr20 |    764710 |    774922 | +      |    773447 |    774335 |         2 | NM_207121 |
| chr9  | 139406074 | 139437535 | -      | 139437535 | 139437535 |         3 | NR_104599 |
| chr21 |  42315181 |  42318098 | +      |  42318098 |  42318098 |         2 | NR_119385 |
| chr21 |  42302371 |  42318098 | +      |  42318098 |  42318098 |         2 | NR_119384 |
| chr17 |  45993427 |  46059831 | +      |  46059831 |  46059831 |        35 | NR_046057 |
| chr7  |  44120803 |  44129694 | -      |  44120908 |  44129683 |        11 | NM_006230 |
| chr5  | 134209268 | 134223324 | +      | 134218489 | 134219056 |         2 | NM_152409 |
| chr5  |  75005779 |  75044036 | -      |  75006015 |  75043965 |        11 | NM_152408 |
| chr12 |  54882554 |  54901982 | -      |  54886497 |  54890496 |         8 | NM_194358 |
| chrX  | 154140712 | 154147068 | -      | 154143281 | 154146767 |         2 | NM_171998 |
+-------+-----------+-----------+--------+-----------+-----------+-----------+-----------+

Check the table schema for refGene from ucsc table browser.

score 0 · Answer 2 · 2014-06-21

0

Entering edit mode

10.4 years ago

Ann ★ 2.4k

Also, the UCSC table browser.

ADD COMMENT • link 10.4 years ago by Ann ★ 2.4k

Ram · Answer 3 · 2014-06-22

0

Entering edit mode

10.4 years ago

lilla.davim ▴ 160

Hello,

Thanks. Actually I wanted to check the following statement from a recent paper:

The set of transcripts used in this experiment were the curated RefSeq transcripts (accession prefix NM) from hg18 (31,148 transcripts).

However I don't find the same number by querying RefGene from hg18:

mysql> select distinct count(*) as total from refGene where name like "NM%";
+-------+
| total |
+-------+
| 38938 |
+-------+
1 row in set (0,31 sec)

Is something wrong in my interpretation or in my query?

Thanks for your help.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by lilla.davim ▴ 160

0

Entering edit mode

The NM IDs in RefSeq are not unique. One transcript can occur on several loci. Perhaps that might explain the difference.

ADD REPLY • link 10.4 years ago by David Langenberger 11k