Question

How Do I Convert Fbgn (Flybase Ids) To Gene Symbols In R/Bioconductor?

2

Entering edit mode

11.0 years ago

enricoferrero ▴ 900

I have a series of text files with FlyBase gene IDs (FBgn) that I need to convert to Gene Symbols. I'd like to do it within R/Bioconductor but I'd also consider some other command-lines methods.

So far I've tried two approaches:

the 'biomaRt' package: it works, but a number of FBgn don't get assigned to Gene Symbols.
the 'org.Dm.eg.db' and the 'annotate' packages: I can't get this to work as FBgn are not recognized.

In both cases I might be doing something wrong, not sure about it.

Does anybody know of a method that work please? Thanks,

r bioconductor conversion id • 9.4k views

ADD COMMENT • link updated 3.0 years ago by marijn_verhaeg • 0 • written 11.0 years ago by enricoferrero ▴ 900

0

Entering edit mode

It may simply be that there is no mapping from FBgn to gene symbol in some cases. This can be for a variety of reasons. Perhaps you could post some examples of FBgn for which biomaRt does not return a gene symbol?

ADD REPLY • link 11.0 years ago by Neilfws 49k

score 2 · Answer 1 · 2017-09-18

2

Entering edit mode

6.6 years ago

aprezvykh ▴ 20

I tried to do it like that:

    library("AnnotationDbi")
    library("org.Dm.eg.db")
    resadj <- as.data.frame(resadj)
    res$symbol <- mapIds(org.Dm.eg.db, 
                            keys=row.names(res), 
                            column="SYMBOL", 
                            keytype="ENSEMBL",
                            multiVals="first")

    res$entrez <- mapIds(org.Dm.eg.db, 
                            keys=row.names(res), 
                            column="ENTREZID", 
                            keytype="ENSEMBL",
                            multiVals="first")

    res$name =   mapIds(org.Dm.eg.db,
                           keys=row.names(res), 
                           column="GENENAME",
                           keytype="ENSEMBL",
                           multiVals="first")
    write.csv(res, file = "res_GO.csv")

resadj is an object that was retrieved from DESeq2 diffexpression with filtering by padj <0.05, and it's rows are names as "FBgn0031701", "FBgn0038074", and columns "baseMean", "Log2FoldChange", "lfcSE", "stat", "pvalue", "padj".

It works :)

ADD COMMENT • link 6.6 years ago by aprezvykh ▴ 20

0

Entering edit mode

Thanks @aprezvykh! It works fine. But I found some question in the codes and results. First, it should be "res" instead of "resadj" in the code. Or it can be "resadj$symbol <- ..........." instead of "res$symbol <- ............". Anyway this is not important. And the second, the keytype I used is "FLYBASE". When I use the "ENSEMBL", the results gave me more "NA" data. I think the "FLYBASE" is more updated.

ADD REPLY • link 4.7 years ago by true202 • 0

0

Entering edit mode

For me, either of 'FLYBASE' and 'ENSEMBL' gave the same result.

ADD REPLY • link 4.0 years ago by Soumitra Pal ▴ 10

0

Entering edit mode

Thanks for the answer, it helped me as well! For anyone else, I just wanted to add that for my case (annotating 1430 ChIP-seq peaks), 'ENSEMBL' gave 138 NAs while 'FLYBASE' only gave 2 NAs. 'FLYBASE' seems better. Curiously, the 2 NAs are not present within the 138 NAs, so it would be possible to replace the NAs based on 'ENSEMBL'.

ADD REPLY • link 3.0 years ago by marijn_verhaeg • 0