Question: Best way to get gene IDs for Salmon transcript output
1
gravatar for dk0319
7 weeks ago by
dk031920
dk031920 wrote:

I generated TPM counts from fastq data using salmon. This leaves me with the NM_transcript IDs. I would like to generate the gene symbols from the transcript IDs . Biomart does not recognize transcripts, NCBI Datasets produces an error when I run the entire transcriptome. I have been exploring tximport and tximeta, however, I have run into numerous issues particularly with tximeta not detecting my ref file. Any advice would be greatly appreciated.

Update: I have txiimport and tximeta now running, however they create S4 objects and I am unsure how to make these readable

rna-seq R • 195 views
ADD COMMENTlink modified 19 days ago • written 7 weeks ago by dk031920
1
gravatar for dk0319
19 days ago by
dk031920
dk031920 wrote:

tximeta was able to compile all my quant.sf files and summarize to gene level

ADD COMMENTlink written 19 days ago by dk031920
1
gravatar for vkkodali
7 weeks ago by
vkkodali2.4k
United States
vkkodali2.4k wrote:

I would like to generate the gene symbols from the transcript IDs

If you need just the gene symbols, and not the sequence, you can parse the gene2refseq.gz file. Depending on the age of your input set of accessions, you may not find information for all of them. That's because gene2refseq.gz file is regularly updated and any NM_ accessions that are no longer latest will be absent. You can download the gene2refseq.gz file from here: https://ftp.ncbi.nlm.nih.gov/gene/DATA/

As a first pass, you can extract information for as many NM_ accessions as you can from this file and then use Entrez Direct or NCBI Datasets to get the information for the remaining ones.

ADD COMMENTlink written 7 weeks ago by vkkodali2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2093 users visited in the last hour
_