Good day!
Where can I find human refseq that come from the same transcription locus (gene) for hg18? So that refseq with overlapping coordinates are clustered into one gene? Is there any special database that can be downloaded?
Thank you in advance!
Good day!
Where can I find human refseq that come from the same transcription locus (gene) for hg18? So that refseq with overlapping coordinates are clustered into one gene? Is there any special database that can be downloaded?
Thank you in advance!
The refSeq are located in the table refGene of the UCSC mysql server. The field name2 can be used to get all the transcripts for the same gene.
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18
mysql> select * from refGene as G where chrom="chr1" and txStart > 1000000 and txEnd < 2000000 limit 2\G
*************************** 1. row ***************************
bin: 592
name: NM_017891
chrom: chr1
strand: -
txStart: 1007060
txEnd: 1041599
cdsStart: 1008135
cdsEnd: 1016786
exonCount: 10
exonStarts: 1007060,1009595,1009723,1011120,1012381,1012744,1015595,1016714,1017233,1041302,
exonEnds: 1008230,1009626,1009749,1011255,1012447,1012840,1015671,1016808,1017346,1041599,
id: 0
name2: C1orf159
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: 1,0,1,1,1,1,0,0,-1,-1,
*************************** 2. row ***************************
bin: 593
name: NR_029639
chrom: chr1
strand: +
txStart: 1092346
txEnd: 1092441
cdsStart: 1092441
cdsEnd: 1092441
exonCount: 1
exonStarts: 1092346,
exonEnds: 1092441,
id: 0
name2: MIR200B
cdsStartStat: unk
cdsEndStat: unk
exonFrames: -1,
2 rows in set (0.20 sec)
It can be downloaded here
For a small number of genes, the most straightforward method to me seems to be just to check the Entrez Gene page (e.g., CDK2).
To do it programmatically, you can use the mysql method that Pierre describes. Or you can download the gene2refseq file from NCBI's ftp site. The columns are pretty self-explanatory but are nevertheless described in the README file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
The file suggested by Pierre could be uploaded to GALAXY. A second file containing your gene/loci of interest can be used to pull out 'join' all the corresponding RefSeq transcripts using 'Join, Subtract and Group' > 'Join two Queries'