I am trying to write a code in R to get the GO label has the highest confident score that comes after " | " symbol
For each gene ID (each row), there are many Go labels (columns), it can go up to 400 labels. And the Go-term with highest confident score can be in any column.
GeneID GO_01 GO_02 GO_03 GO_04 exi2A01G0001540.1 GO:0005575|0.853 GO:0005622|0.705 GO:0005623|0.846 GO:0005634|0.531 exi2A01G0001560.1 GO:0005575|0.324 GO:0044699|0.319 GO:0044464|0.324 GO:0005623|0.524 exi9A01G0045270.1 GO:0003674|0.356 GO:0005575|0.679 GO:0005622|0.539
I think it's possible to retain the GO-labels that has the highest confident score.
So for example results would be like this:
GeneID GO-term exi2A01G0001540.1 GO:0005575|0.853 exi2A01G0001560.1 GO:0005623|0.524 exi9A01G0045270.1 GO:0005575|0.679
I srarted R code:
GO_1 <- read.table("proteinGO-term_0.3.txt", header=T, sep="\t", fill=T) #have gene ID as a row name: GO_2 <- GO_1[,-1] rownames(GO_2) <- GO_1[,1] # #I tried this, but it doesn't do what I want: test <- apply(GO_2,1,function(x) which(x==max(x)))