how do I query the tables (https://genome.ucsc.edu/cgi-bin/hgTables) to get a gtf file from ucsc that has gene biotype? The biotypes can come from Ensembl. But I want an annotation file with biotypes, if possible. (:
how do I query the tables (https://genome.ucsc.edu/cgi-bin/hgTables) to get a gtf file from ucsc that has gene biotype? The biotypes can come from Ensembl. But I want an annotation file with biotypes, if possible. (:
Since you have tagged the post with 'refseq', I am assuming you are interested in RefSeq annotation. If that is the case, I suggest you download the relevant files directly from NCBI FTP site. The GTF and GFF3 files for RefSeq annotation include gene_biotype information in column 9. 
Just adding for other users who land on this page.
Another solution is to simply generate a 'master' table in biomaRt:
require('biomaRt')
mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)
Check that it is indeed GRCh38:
searchDatasets(mart = mart, pattern = 'hsapiens')
                 dataset              description    version
78 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13
Now generate the table:
annotLookup <- getBM(
  mart = mart,
  attributes = c(
    'hgnc_symbol',
    'ensembl_gene_id',
    'refseq_mrna',
    'refseq_ncrna',
    'gene_biotype'),
  uniqueRows = TRUE)
head(annotLookup)
  hgnc_symbol ensembl_gene_id refseq_mrna refseq_ncrna   gene_biotype
1       MT-TF ENSG00000210049                                 Mt_tRNA
2     MT-RNR1 ENSG00000211459                NR_137294        Mt_rRNA
3       MT-TV ENSG00000210077                                 Mt_tRNA
4     MT-RNR2 ENSG00000210082                NR_137295        Mt_rRNA
5      MT-TL1 ENSG00000209082                                 Mt_tRNA
6      MT-ND1 ENSG00000198888                          protein_coding
Kevin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi vkkodali,
I have the same question, I followed your answer and download hg19_gtf file but in column 9, there is only gene id, not gene_biotype (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing )
Please download the GTF from GENCODE: https://www.gencodegenes.org/
Please download data from NCBI RefSeq FTP site, not UCSC. For hg19, you can search for
GRCh37in NCBI Assembly portal to get to this page. Once you are there, click on the 'Download Assembly' button, choose 'RefSeq' as source database and GTF as your file type. You will end up downloading a tarball with the GTF file. Alternatively, you can go to the FTP path directly by clicking on the 'FTP directory for RefSeq assembly' link on the right-hand bar and choose the file of interest to you.^^ this can work, too.