Hi everyone,
I am trying to convert gene ids to gene names from RNAseq data of six samples. I have initially completed aligning and mapping with Galaxy, and calculated counts with featurecounts in Galaxy.
Now I am trying to convert the gene ids to names with the orgs.HS.eg package in bioconductor.
I have a merged the data from the six samples to one file ('count file'), where one column indicates gene_id and every other column is for each of the 6 samples (i.e. 7 columns in total).
SRR1554534 SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
NA. 211 416 381 2460 2721 2213
NA..1 22 6 6 2 2 6
NA..2 733 325 1240 1687 2235 1741
NA..3 333 104 723 1390 1807 1165
NA..4 40 17 112 757 1005 872
NA..5 453 219 198 110 143 82
Now I try to convert to gene names with this:
for (i in 1:nrow(count_file)){count_file[i,1] = lookUp(toString(count_file[i,1]), 'org.Hs.eg', 'SYMBOL')}
rownames(count_file) = make.names(count_file[,1], unique=TRUE)
count_file[,1] = NULL
count_file
I receive the following result:
SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
ALAS1 416 381 2460 2721 2213
ABCB7 6 6 2 2 6
C8G 325 1240 1687 2235 1741
APLP1 104 723 1390 1807 1165
ASIC2 17 112 757 1005 872
ASS1P8 219 198 110 143 82
...
My question: I am wondering why every time my first sample is missing - there are always only 5 instead of 6 samples.
Do you have any idea where the mistake is?
Thanks a lot!
You say you have 7 columns, but your initial data frame has only 6. Where is the
gene_id
column? Your current gene symbols are obtained by treating SRR1554534's counts as Entrez IDs.Thank you for your help!
I have corrected the tabular list now to the following:
Using the same code as above...
... I receive the following result when running 'org.Hs.eg': Now with 6 sample columns, but without a proper gene name:
Can you spot an error in my code? Or is org.Hs.eg maybe not suitable for this?
Many thanks!
What is
lookUp
doing? The fact that the code worked when you were using Entrez IDs but it stopped now that you're using ENSEMBL Gene IDs should tell you where the problem lies.Thanks!
I was able to surpass this problem by using an annotation file with gene names.
I haven't altered the code to the initial one - it seems to me that 'org.Hs.eg' seems to expect EntrezID in the first column, rather than the ENSEMBL Gene ID - is it possible that I need to use a different code for ENSEMBL IDs?
You can use
select
ormapIds
to get from one ID to another. Use the appropriatekeyType
argument.