Question: get gene biotype for hg38 refseq
1
gravatar for idaliu613
11 months ago by
idaliu61310
idaliu61310 wrote:

how do I query the tables (https://genome.ucsc.edu/cgi-bin/hgTables) to get a gtf file from ucsc that has gene biotype? The biotypes can come from Ensembl. But I want an annotation file with biotypes, if possible. (:

gene biotype hg38 refseq • 382 views
ADD COMMENTlink modified 7 weeks ago by Kevin Blighe59k • written 11 months ago by idaliu61310
1
gravatar for vkkodali
7 months ago by
vkkodali2.0k
United States
vkkodali2.0k wrote:

Since you have tagged the post with 'refseq', I am assuming you are interested in RefSeq annotation. If that is the case, I suggest you download the relevant files directly from NCBI FTP site. The GTF and GFF3 files for RefSeq annotation include gene_biotype information in column 9.

ADD COMMENTlink written 7 months ago by vkkodali2.0k

Hi vkkodali,

I have the same question, I followed your answer and download hg19_gtf file but in column 9, there is only gene id, not gene_biotype (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing )

ADD REPLYlink written 7 weeks ago by xiaoleiusc40
1

Please download the GTF from GENCODE: https://www.gencodegenes.org/

ADD REPLYlink written 7 weeks ago by Kevin Blighe59k
1

Please download data from NCBI RefSeq FTP site, not UCSC. For hg19, you can search for GRCh37 in NCBI Assembly portal to get to this page. Once you are there, click on the 'Download Assembly' button, choose 'RefSeq' as source database and GTF as your file type. You will end up downloading a tarball with the GTF file. Alternatively, you can go to the FTP path directly by clicking on the 'FTP directory for RefSeq assembly' link on the right-hand bar and choose the file of interest to you.

ADD REPLYlink written 7 weeks ago by vkkodali2.0k
1

^^ this can work, too.

ADD REPLYlink written 7 weeks ago by Kevin Blighe59k
1
gravatar for Kevin Blighe
7 weeks ago by
Kevin Blighe59k
Kevin Blighe59k wrote:

Just adding for other users who land on this page.

Another solution is to simply generate a 'master' table in biomaRt:

require('biomaRt')

mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)

Check that it is indeed GRCh38:

searchDatasets(mart = mart, pattern = 'hsapiens')
                 dataset              description    version
78 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13

Now generate the table:

annotLookup <- getBM(
  mart = mart,
  attributes = c(
    'hgnc_symbol',
    'ensembl_gene_id',
    'refseq_mrna',
    'refseq_ncrna',
    'gene_biotype'),
  uniqueRows = TRUE)


head(annotLookup)
  hgnc_symbol ensembl_gene_id refseq_mrna refseq_ncrna   gene_biotype
1       MT-TF ENSG00000210049                                 Mt_tRNA
2     MT-RNR1 ENSG00000211459                NR_137294        Mt_rRNA
3       MT-TV ENSG00000210077                                 Mt_tRNA
4     MT-RNR2 ENSG00000210082                NR_137295        Mt_rRNA
5      MT-TL1 ENSG00000209082                                 Mt_tRNA
6      MT-ND1 ENSG00000198888                          protein_coding

Kevin

ADD COMMENTlink written 7 weeks ago by Kevin Blighe59k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1814 users visited in the last hour