Entering edit mode
5.8 years ago
preetomregon
•
0
Hi, I am very new to the RNA Seq data analysis and I am trying to do the GO enrichment analysis using the biomaRt package of R. I want to merge the GO annotation to the DESeq2 data. I run the following script to obtain the GO terms
library(DESeq2)
library(biomaRt)
listMarts(host="plants.ensembl.org")
m <- useMart("plants_mart", host="plants.ensembl.org")
m <- useMart("plants_mart", dataset="osativa_eg_gene", host="plants.ensembl.org")
go <- getBM(attributes=c("ensembl_gene_id","ensembl_transcript_id", "start_position", "end_position","go_id","name_1006"), mart=m)
go[1:10,]
write.table(go, "GOannotationsBiomart.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, sep="\t")`
Now I want merge the above GO details with DESeq2 data. The final table of DESeq2 data are as follows
write.csv(data, file=paste0(outputPrefix,"_results_with_normalized_final.csv"))
results_csv<-"O_sativa_DESeq2_results_with_normalized_final.csv"
write.table(read.csv(results_csv), gsub(".csv",".txt",results_csv))
results_txt<-"O_sativa_DESeq2_results_with_normalized_final.txt"
a<-read.table(results_txt, head=TRUE)
Any Suggestion and alternative tutorial will be appreciated. Thanks
Thanks, Kevin for your kind help. It's exactly what I am looking for. But it hasn't worked in my case. I assumed that it's due to the data frame of my GO terms as I have multiple GO terms for a single gene and transcript.
https://ibb.co/ggbCWT
Following is the output of my result.
https://ibb.co/eNV6rT
Hello, can you paste an actual example of your data here? Once you paste it, highlight it and encapsulate it by using the
101 010
button. It will make it easier for me to test.Thank you for your reply. For example, I have 3 multiple go_ids for the Os08g0254300 gene which may be the problem. Is it possible to merge the multiple "go_id" and the "name_1006" for Os08g0254300 gene? For example
I am quite new in this field and I hope my question is relevant, to make you understand. Kindly suggest anything if I am wrong. Thanks
Okay, let me take a look later!
Sorry, my time was limited right now. To help you a bit, this code will at leasat collapse the GO IDs to have just a single record per gene:
Note that the gene names are stored as rownames here. This should now make it easier to merge with your other data-frame of statistical values and fold-changes.
Thank you again for your time. I tried several ways to merge the two data frames of small sizes but did not get through. Following are the my examples-
I followed several examples to merge the data frames but unable to merge in my case. for example
Some outputs are with warnings but error free and the resulting data frame not as per my requirement. The cbind(deq,df.new) code work great only if, the no. of rows are same and also if the position of the similar rownames are same in the data frames. Any suggestion will be appreciated. Thanks
I see. In this case, you also have an issue with upper and lower case characters. For example: OS08G0460000 versus Os08g0460000
There is a function
toupper
, though:I'm aware that these merge functions can be difficult. There are also a few different ways of doing it, include
merge
,match
, andwhich
. I just happen to be familiar withmatch
, but I know others who usemerge
. I would just encourage you to study well how they operate (because they are each different) and to also double-check the output. Years ago a a junior I frequently got caught out by mis-using these commands. One requires many 'checks' and 'balances', like the US government.Thank you for the code and suggestions