Question: How to download mm10 GTF file with the gene id and gene name using UCSC table browser?
2
gravatar for John
6 months ago by
John210
United States
John210 wrote:

Hi, what is the parameters I should put to download the same format GTF file like the first line of GTF file below, for mm10 ?

chr1    unknown exon    3214482 3216968 .   -   .   gene_id "Xkr4"; gene_name "Xkr4"; p_id "P14345"; transcript_id "NM_001011874"; tss_id "TSS25485";

I can download this format using the following parameters for mm9 but not for mm10!!!

Assembly: mm9
Group: Gene and Gene prediction tracks; 
Track: RefSeq genes; 
Table: refFlat
Output format: GTF

Thanks

ucsc rna-seq alignment • 1.8k views
ADD COMMENTlink modified 6 months ago by Luis Nassar390 • written 6 months ago by John210
4
gravatar for Luis Nassar
6 months ago by
Luis Nassar390
UCSC Genome Browser
Luis Nassar390 wrote:

Hello,

Short answer: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz

Long answer:

Due to the way the Table Browser forms queries, the Table Browser GTF output repeats the gene_id and transcript_id fields as such:

chr1    mm9_refFlat stop_codon  3206103 3206105 0.000000    -   .   gene_id "Xkr4"; transcript_id "Xkr4"; 

This is why we denote that output as "GTF (limited)". We have a wiki page for how to accomplish this properly (http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format) which comes down to using a separate utility for the conversion. Another reason this may have been confusing, is you did not see the same reFlat table available on the Table Browser. This is because in mm10/hg19/hg38, NCBI started releasing coordinates along with their annotation sequences. This means that to get the equivalent of your selection for mm10, you would use the following:

Assembly: mm9
Group: Gene and Gene prediction tracks; 
Track: NCBI RefSeq; 
Table: UCSC RefSeq (refGene)
Output format: GTF (limited)

Like refFlat, these are our own alignments of the NCBI sequences. However, due to the limited output you will not have the gene name (included in refFlat) unless you follow the wiki conversion.

We also have begun to offer these proper GTF files in our downloads directory. Here it is for mm10: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/

The equivalent you will want to use will be http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.refGene.gtf.gz

If you have further questions, you can reach us at genome@soe.ucsc.edu. It may take us a little longer to answer questions on biostars.

ADD COMMENTlink written 6 months ago by Luis Nassar390
2

Hi Luis, What about the human? Can you share the gtf link for hg19 and hg38?

ADD REPLYlink modified 6 months ago • written 6 months ago by Shicheng Guo8.3k
2

Yes, we are still in the process of making them available for all of our assemblies.

hg38 GTFs: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/

hg19 GTFs: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/

ADD REPLYlink modified 6 months ago • written 6 months ago by Luis Nassar390
1

And what's the difference between refGene and ncbiRefSeq gtf?

ADD REPLYlink written 5 months ago by Ömer An200
1

The difference is the dataset they were sourced from. You can read about these different tracks in the description page (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite).

ncbiRefSeq - RefSeq All – all curated and predicted annotations provided by RefSeq.
refGene - UCSC RefSeq – annotations generated from UCSC's realignment of RNAs with NM and NR accessions to the human genome. This track was previously known as the "RefSeq Genes" track.

Essentially ncbiRefSeq contains all transcripts including predicted. For refGene we pull out only the NM_* and NR_* sequences (mRNA and RNA) and we align them ourselves to the genome using BLAT. See this for NCBI prefixes (https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly). Removing these computationally predicted transcripts cuts the table nearly in half. hg38 refGene has 82,864 items and ncbiRefSeq has 166,923 items. You may also find this similar question helpful: A: RefGene: how to find the starts and ends of genes?

ADD REPLYlink modified 5 months ago • written 5 months ago by Luis Nassar390
0
gravatar for badribio
6 months ago by
badribio240
badribio240 wrote:

Like this?

ADD COMMENTlink modified 6 months ago by Emily_Ensembl21k • written 6 months ago by badribio240

I can't see anything! thanks

ADD REPLYlink written 6 months ago by John210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1393 users visited in the last hour