Question: get gene biotype for hg38 refseq
1
gravatar for idaliu613
17 months ago by
idaliu61310
idaliu61310 wrote:

how do I query the tables (https://genome.ucsc.edu/cgi-bin/hgTables) to get a gtf file from ucsc that has gene biotype? The biotypes can come from Ensembl. But I want an annotation file with biotypes, if possible. (:

gene biotype hg38 refseq • 617 views
ADD COMMENTlink modified 7 months ago by Kevin Blighe67k • written 17 months ago by idaliu61310
1
gravatar for vkkodali
13 months ago by
vkkodali2.2k
United States
vkkodali2.2k wrote:

Since you have tagged the post with 'refseq', I am assuming you are interested in RefSeq annotation. If that is the case, I suggest you download the relevant files directly from NCBI FTP site. The GTF and GFF3 files for RefSeq annotation include gene_biotype information in column 9.

ADD COMMENTlink written 13 months ago by vkkodali2.2k

Hi vkkodali,

I have the same question, I followed your answer and download hg19_gtf file but in column 9, there is only gene id, not gene_biotype (screen capture link: https://drive.google.com/file/d/1OkpcDF_u2-yzAKlg8s46vIVOi8pc4AZ1/view?usp=sharing )

ADD REPLYlink written 7 months ago by xiaoleiusc60
1

Please download the GTF from GENCODE: https://www.gencodegenes.org/

ADD REPLYlink written 7 months ago by Kevin Blighe67k
1

Please download data from NCBI RefSeq FTP site, not UCSC. For hg19, you can search for GRCh37 in NCBI Assembly portal to get to this page. Once you are there, click on the 'Download Assembly' button, choose 'RefSeq' as source database and GTF as your file type. You will end up downloading a tarball with the GTF file. Alternatively, you can go to the FTP path directly by clicking on the 'FTP directory for RefSeq assembly' link on the right-hand bar and choose the file of interest to you.

ADD REPLYlink written 7 months ago by vkkodali2.2k
1

^^ this can work, too.

ADD REPLYlink written 7 months ago by Kevin Blighe67k
1
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

Just adding for other users who land on this page.

Another solution is to simply generate a 'master' table in biomaRt:

require('biomaRt')

mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)

Check that it is indeed GRCh38:

searchDatasets(mart = mart, pattern = 'hsapiens')
                 dataset              description    version
78 hsapiens_gene_ensembl Human genes (GRCh38.p13) GRCh38.p13

Now generate the table:

annotLookup <- getBM(
  mart = mart,
  attributes = c(
    'hgnc_symbol',
    'ensembl_gene_id',
    'refseq_mrna',
    'refseq_ncrna',
    'gene_biotype'),
  uniqueRows = TRUE)


head(annotLookup)
  hgnc_symbol ensembl_gene_id refseq_mrna refseq_ncrna   gene_biotype
1       MT-TF ENSG00000210049                                 Mt_tRNA
2     MT-RNR1 ENSG00000211459                NR_137294        Mt_rRNA
3       MT-TV ENSG00000210077                                 Mt_tRNA
4     MT-RNR2 ENSG00000210082                NR_137295        Mt_rRNA
5      MT-TL1 ENSG00000209082                                 Mt_tRNA
6      MT-ND1 ENSG00000198888                          protein_coding

Kevin

ADD COMMENTlink written 7 months ago by Kevin Blighe67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour