How Do I Convert Fbgn (Flybase Ids) To Gene Symbols In R/Bioconductor?
1
2
Entering edit mode
11.0 years ago
enricoferrero ▴ 900

I have a series of text files with FlyBase gene IDs (FBgn) that I need to convert to Gene Symbols. I'd like to do it within R/Bioconductor but I'd also consider some other command-lines methods.

So far I've tried two approaches:

  • the 'biomaRt' package: it works, but a number of FBgn don't get assigned to Gene Symbols.
  • the 'org.Dm.eg.db' and the 'annotate' packages: I can't get this to work as FBgn are not recognized.

In both cases I might be doing something wrong, not sure about it.

Does anybody know of a method that work please? Thanks,

r bioconductor conversion id • 9.4k views
ADD COMMENT
0
Entering edit mode

It may simply be that there is no mapping from FBgn to gene symbol in some cases. This can be for a variety of reasons. Perhaps you could post some examples of FBgn for which biomaRt does not return a gene symbol?

ADD REPLY
2
Entering edit mode
6.6 years ago
aprezvykh ▴ 20

I tried to do it like that:

    library("AnnotationDbi")
    library("org.Dm.eg.db")
    resadj <- as.data.frame(resadj)
    res$symbol <- mapIds(org.Dm.eg.db, 
                            keys=row.names(res), 
                            column="SYMBOL", 
                            keytype="ENSEMBL",
                            multiVals="first")

    res$entrez <- mapIds(org.Dm.eg.db, 
                            keys=row.names(res), 
                            column="ENTREZID", 
                            keytype="ENSEMBL",
                            multiVals="first")

    res$name =   mapIds(org.Dm.eg.db,
                           keys=row.names(res), 
                           column="GENENAME",
                           keytype="ENSEMBL",
                           multiVals="first")
    write.csv(res, file = "res_GO.csv")

resadj is an object that was retrieved from DESeq2 diffexpression with filtering by padj <0.05, and it's rows are names as "FBgn0031701", "FBgn0038074", and columns "baseMean", "Log2FoldChange", "lfcSE", "stat", "pvalue", "padj".

It works :)

ADD COMMENT
0
Entering edit mode

Thanks @aprezvykh! It works fine. But I found some question in the codes and results. First, it should be "res" instead of "resadj" in the code. Or it can be "resadj$symbol <- ..........." instead of "res$symbol <- ............". Anyway this is not important. And the second, the keytype I used is "FLYBASE". When I use the "ENSEMBL", the results gave me more "NA" data. I think the "FLYBASE" is more updated.

ADD REPLY
0
Entering edit mode

For me, either of 'FLYBASE' and 'ENSEMBL' gave the same result.

ADD REPLY
0
Entering edit mode

Thanks for the answer, it helped me as well! For anyone else, I just wanted to add that for my case (annotating 1430 ChIP-seq peaks), 'ENSEMBL' gave 138 NAs while 'FLYBASE' only gave 2 NAs. 'FLYBASE' seems better. Curiously, the 2 NAs are not present within the 138 NAs, so it would be possible to replace the NAs based on 'ENSEMBL'.

ADD REPLY

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6