Question

gencode gtf file derived geneID can't be annotated to gene symbol following Deseq2 manual

0

Entering edit mode

4.3 years ago

Kai_Qi ▴ 130

Hi:

I used gencode GRCm38 GTF for annoation during reads counts using RSubread.

I can't get the gene symbol following the DEseq2 manual:

resTC$symbol <- mapIds(org.Mm.eg.db, keys = row.names(resTC), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")

I looked over it again and found that the geneID I got is in this format:

ENSMUSG00000029848.11

So I manually put it this geneID into NCBI gene and it does not match anything. However, if I use ENSMUSG00000029848 it will tell me it is "Stra8".

Can anyone tell me how to solve this problem?

Thank you very much,

R RNA-Seq gene • 1.4k views

ADD COMMENT • link 4.3 years ago by Kai_Qi ▴ 130

0

Entering edit mode

I got an answer from a previous post by others (https://www.biostars.org/p/301116/#496172).

But when I tried to ger rid of trail numbers using:

row.names(res) <- gsub(".*$" , "", row.names(res))

it turns out everything was replaced with "".

ADD REPLY • link 4.3 years ago by Kai_Qi ▴ 130

score 2 · Answer 1 · 2021-03-12

2

Entering edit mode

4.3 years ago

ATpoint 88k

gsub(\\..*", "", rownames(res))

Double backslash escapes the dot character (that is the first dot), and the second dot followed by wildcard means "remove everything after that character that was specified (which is \\. here).

ADD COMMENT • link 4.3 years ago by ATpoint 88k

0

Entering edit mode

Thank you very much. It indeed removed the trailing numbers:

> head(row.names(res))
[1] "ENSMUSG000000519515" "ENSMUSG000001028511" "ENSMUSG000001033771" "ENSMUSG000001031471" "ENSMUSG000001023311"
[6] "ENSMUSG000001023481"

But when I typed:

> res$symbol <- mapIds(org.Mm.eg.db, keys = row.names(res), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")

I got

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

The second question is on the gsub command:

in my early command, I have used it in this way:

head(colnames(countdata))
[1] "DIV0.1.bam" "DIV0.2.bam" "DIV0.3.bam" "DIV7.1.bam" "DIV7.2.bam" "DIV7.3.bam"
colnames(countdata) <- gsub(".bam" , "", colnames(countdata))
head(colnames(countdata))
[1] "DIV0.1" "DIV0.2" "DIV0.3" "DIV7.1" "DIV7.2" "DIV7.3"

It worked well. These differences got me a little bit confused.

Thanks you very much

ADD REPLY • link 4.3 years ago by Kai_Qi ▴ 130