Question: How to add gene symbol to RNA-Seq data using R
1
gravatar for williamsbrian5064
9 months ago by
williamsbrian5064170 wrote:

Hi,

So I am trying to add the gene ID to my RNA-Seq data. So what I did was use Salmon to quantify my reads but instead of doing some DE analysis, I want to try and us an alternative program to try and do a little machine learning. So I really just need add the gene symbol next to the transcript ID so I can easily identify the gene instead of having to look up the transcript ID in ensembl

I know there are some packages out there that can help with this like org.Cf.eg.db. I can't seem to figure out how to make the package work with transcript IDs though. I am inexperienced with the package so that is most likely my issue. Here is an example of what my data looks like after it was quantified.

> head(test)
                  Name Length EffectiveLength      TPM   NumReads
1 ENSCAFT00000034820.1    957         736.829 1309.272  43423.000
2 ENSCAFT00000034824.1   1044         823.630 1001.516  37129.000
3 ENSCAFT00000034830.1   1545        1324.630 3796.436 226357.000
4 ENSCAFT00000034833.1    684         464.046 8086.686 168910.000
5 ENSCAFT00000034835.1    204          50.476 4033.303   9163.596
6 ENSCAFT00000034836.1    681         461.059 9748.035 202300.391

I really just want to add the gene symbol so the data looks something like this

> head(test)
                  Name Length EffectiveLength      TPM   NumReads Symbol
1 ENSCAFT00000034820.1    957         736.829 1309.272  43423.000 ABC
2 ENSCAFT00000034824.1   1044         823.630 1001.516  37129.000 ABD
3 ENSCAFT00000034830.1   1545        1324.630 3796.436 226357.000 ABE
4 ENSCAFT00000034833.1    684         464.046 8086.686 168910.000 ABF
5 ENSCAFT00000034835.1    204          50.476 4033.303   9163.596 ABG
6 ENSCAFT00000034836.1    681         461.059 9748.035 202300.391 ABH
rna-seq R • 628 views
ADD COMMENTlink modified 9 months ago by RamRS22k • written 9 months ago by williamsbrian5064170
13
gravatar for Prakash
9 months ago by
Prakash1.2k
India
Prakash1.2k wrote:

you can use R package "biomaRt" to annotate you transcript id to gene name. see if the below code works

library( "biomaRt" )
mart = useMart('ensembl')
# list all the ensembl database of organisms
listDatasets(mart)  
#choose database of your interest ; in this case its "cfamiliaris_gene_ensembl" I guess
ensembl = useMart( "ensembl", dataset = "cfamiliaris_gene_ensembl" )  
# choose attributes of your interest
listAttributes(ensembl)
gene <- getBM( attributes = c("ensembl_transcript_id","external_gene_name"),values = test$Name,mart = ensembl)  
#Macth your transcript id with ensembl_transcript_id
id <- match(test$Name , gene$ensembl_transcript_id)
#Add Gene symbol column in your data frame
test$Symbol <- gene$external_gene_name[id]
head(test)
ADD COMMENTlink modified 9 months ago • written 9 months ago by Prakash1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2024 users visited in the last hour