Question: Format Refseq Output Obtained From Ucsc Table Browser ?
1
gravatar for User 3035
7.9 years ago by
User 303510
User 303510 wrote:

I want to get the list of RefSeq genes for human from the UCSC Table Browser. As you know, the RefSeq file that we get from the UCSC Table Browser contains the mRNA Refseq Accession number for every gene (for eg. NR_028269) .

Is there a way by which I can modify this output to get the 'Gene Symbols' instead of those mRNA RefSeq IDs ?

gene refseq identifiers • 5.3k views
ADD COMMENTlink written 7.9 years ago by User 303510
7
gravatar for brentp
7.9 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

As @Pierre notes, the name you want is in name2. If you want to get BED format from the SQL, you can use something like:

ORG=$1
#mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P 3306   -e "select chrom,txStart,txEnd,name2 as name,strand,exonStarts,exonEnds from refGene;" > $ORG.notbed
awk '
        BEGIN { OFS = "\t"; FS = "\t"} ;
        (NR != 1){
                delete astarts; delete aends;
                split($6, astarts, ",");
                split($7, aends, ",");

                starts=""; sizes=""
                exonCount=0
                for(i=1; i <= length(astarts); i++){
                    if (! astarts[i]) continue
                    sizes=sizes""(aends[i] - astarts[i])","
                    starts=starts""(astarts[i] = astarts[i] - $2)","
                    exonCount=exonCount + 1
                }
                print $1,$2,$3,$4,1,$5,$2,$3,"0",exonCount,sizes,starts
}' $ORG.notbed | sort -k1,1 -k2,2n > refGene.$ORG.bed

which you can save as refGene.sh and use as

sh refGene.sh hg19

or

sh refGene.sh mm9
ADD COMMENTlink written 7.9 years ago by brentp23k

this is very helpful. Thanks

ADD REPLYlink written 7.9 years ago by Gjain5.3k

thank you so much !

ADD REPLYlink written 7.8 years ago by User 303510
5
gravatar for Pierre Lindenbaum
7.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

Using the mysql server of the UCSC:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
     -e 'select name2 from refGene where name="NR_028269"'
+--------------+
| name2        |
+--------------+
| LOC100288778 |
+--------------+
ADD COMMENTlink written 7.9 years ago by Pierre Lindenbaum120k
4
gravatar for Gjain
7.9 years ago by
Gjain5.3k
Göttingen, Germany
Gjain5.3k wrote:

Another way you can do in table browser is: there in the "output format" field

  1. choose "selected fields from primary and related tables"
  2. click "get output"
  3. if you are in HG19, you will see "Select Fields from hg19.refGene
  4. check on name, chrom, strand, txStart, txEnd and name2
  5. click on "get output"

The output should look like this:

+--------------+-------+--------+----------+----------+--------------+
|    #name     | chrom | strand | txStart  |  txEnd   |    name2     |
+--------------+-------+--------+----------+----------+--------------+
| NM_032291    | chr1  | +      | 66999824 | 67210768 | SGIP1        |
| NM_001080397 | chr1  | +      |  8384389 |  8404227 | SLC45A1      |
| NM_001145277 | chr1  | +      | 16767166 | 16786584 | NECAP2       |
| NR_028269    | chr12 | +      |    87983 |    91263 | LOC100288778 |
| NR_026823    | chr12 | -      |   147945 |   149412 | FAM138D      |
| NR_033859    | chr12 | -      |   246576 |   258332 | LOC574538    |
+--------------+-------+--------+----------+----------+--------------+

Hope this helps.

ADD COMMENTlink modified 10 months ago • written 7.9 years ago by Gjain5.3k
+----------------+------+---+----------+----------+----------+----------+-------+
| NM_001293562.1 | chr1 | + | 33546713 | 33586132 | 33547850 | 33585783 | AZIN2 |
+----------------+------+---+----------+----------+----------+----------+-------+
| NM_052998.3    | chr1 | + | 33546713 | 33586132 | 33547850 | 33585783 | AZIN2 |
| NM_001301824.1 | chr1 | + | 33546729 | 33586132 | 33557656 | 33585783 | AZIN2 |
| NM_001301823.1 | chr1 | + | 33546729 | 33586132 | 33557656 | 33585783 | AZIN2 |
| NM_001301826.1 | chr1 | + | 33547778 | 33567493 | 33547850 | 33567493 | AZIN2 |
| NR_126031.1    | chr1 | + | 33547778 | 33567493 | 33567493 | 33567493 | AZIN2 |
| NM_001301825.1 | chr1 | + | 33547778 | 33586132 | 33547850 | 33585783 | AZIN2 |
+----------------+------+---+----------+----------+----------+----------+-------+

how can i choose a coordinates of a gene "AZIN2" from this repeats

ADD REPLYlink modified 10 months ago by Gjain5.3k • written 10 months ago by sujasubramanian50

It will depend on the biological question you are interested in. You can choose the longest transcript or other criteria that fits your hypothesis.

ADD REPLYlink written 10 months ago by Gjain5.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1013 users visited in the last hour