Question

Merge columns adding conditions for every row

0

Entering edit mode

4.2 years ago

miriam.riquelmep • 0

As you may know, when converting gene names into all the different terminologies, we sometimes lose some info, because not all database contain all genes, there are synonims, new genes... In my case, I have a list of differentially expressed genes in Ensembl annotation and I want to convert it to their gene symbol so that it is human readable. I tried 2 different ways and I generated 3 columns with the corresponding symbol of each Ensembl. The file (.csv) could look like this, (this is a fake example):

Ensembl |  Method1 | Method2

1. ENSMUS0000001  | Htt |  NA  |

2. ENSMUS0000002  | Socs3 | SocsX  |

3. ENSMUS0000003  | NA | Jak2 |

4. ENSMUS0000004   | NA | NA |

Then I would like to merge into a single column with an script that go throught all rows. The behavior I expect for the program is "if the name in the Method1 column in not NA, take it no matter what the other symbol is (case 1 and 2). If the name in this column is NA and the name in the Method2 is not, take this other (case 3). If both are NA, keep the Ensembl (case 4)".

Could someone give me a little bit of light with the code? Is it okay to use a bash script or should I use another powerful language for this?

Thank you in advance.

bash code genes RNA-Seq • 882 views

ADD COMMENT • link updated 4.2 years ago by GenoMax 141k • written 4.2 years ago by miriam.riquelmep • 0

GenoMax · Answer 1 · 2020-02-12

1

Entering edit mode

4.2 years ago

patelk26 ▴ 290

There are multiple ways you can do this, I did it in R:

test <- read.csv('test.csv')
res <- data.frame('result'=ifelse(!is.na(test$Method1), test$Method1,
          ifelse(!is.na(test$Method2),test$Method2, test$Ensembl)))

ADD COMMENT • link 4.2 years ago by patelk26 ▴ 290

0

Entering edit mode

Thank you very much, patelk26. Indeed, is very nice that I can do it with R, because I am working with it so I don't have to mix different languages or do things outside in the terminal to introduce again in R.

I just had to add as.vector(), because my data actually is introduced differently, but it works perfectly! Here is the final code:

annot <- NULL

annot$ENSEMBL <- rownames(featureCounts)

annot$SYMBOL <-  mapIds(EnsDb.Mmusculus.v79, keys=rownames(featureCounts), column="SYMBOL",keytype="GENEID")

annot$SYMBOL1 <- mapIds(org.Mm.eg.db, keys = rownames(featureCounts), column = 'SYMBOL', keytype = 'ENSEMBL', multiVals = 'first') 

annot <- as.data.frame(annot)

consensus <- data.frame('Symbol'=ifelse(!is.na(annot$SYMBOL), as.vector(annot$SYMBOL),
          ifelse(!is.na(annot$SYMBOL1),as.vector(annot$SYMBOL1), as.vector(annot$ENSEMBL))))

annot$consensus <- consensus

Thanks again.

ADD REPLY • link updated 4.2 years ago by GenoMax 141k • written 4.2 years ago by miriam.riquelmep • 0

1

Entering edit mode

miriam.riquelmep : @patelk26's comment to an answer. If it answer was helpfulthen you should mark it as accepted..
Upvote|Bookmark|Accept