Merge columns adding conditions for every row
1
0
Entering edit mode
16 months ago

As you may know, when converting gene names into all the different terminologies, we sometimes lose some info, because not all database contain all genes, there are synonims, new genes... In my case, I have a list of differentially expressed genes in Ensembl annotation and I want to convert it to their gene symbol so that it is human readable. I tried 2 different ways and I generated 3 columns with the corresponding symbol of each Ensembl. The file (.csv) could look like this, (this is a fake example):

Ensembl |  Method1 | Method2

1. ENSMUS0000001  | Htt |  NA  |

2. ENSMUS0000002  | Socs3 | SocsX  |

3. ENSMUS0000003  | NA | Jak2 |

4. ENSMUS0000004   | NA | NA |

Then I would like to merge into a single column with an script that go throught all rows. The behavior I expect for the program is "if the name in the Method1 column in not NA, take it no matter what the other symbol is (case 1 and 2). If the name in this column is NA and the name in the Method2 is not, take this other (case 3). If both are NA, keep the Ensembl (case 4)".

Could someone give me a little bit of light with the code? Is it okay to use a bash script or should I use another powerful language for this?

Thank you in advance.

bash code genes RNA-Seq • 325 views
ADD COMMENT
1
Entering edit mode
16 months ago
patelk26 ▴ 230

There are multiple ways you can do this, I did it in R:

test <- read.csv('test.csv')
res <- data.frame('result'=ifelse(!is.na(test$Method1), test$Method1,
          ifelse(!is.na(test$Method2),test$Method2, test$Ensembl)))
ADD COMMENT
0
Entering edit mode

Thank you very much, patelk26. Indeed, is very nice that I can do it with R, because I am working with it so I don't have to mix different languages or do things outside in the terminal to introduce again in R.

I just had to add as.vector(), because my data actually is introduced differently, but it works perfectly! Here is the final code:

annot <- NULL

annot$ENSEMBL <- rownames(featureCounts)

annot$SYMBOL <-  mapIds(EnsDb.Mmusculus.v79, keys=rownames(featureCounts), column="SYMBOL",keytype="GENEID")

annot$SYMBOL1 <- mapIds(org.Mm.eg.db, keys = rownames(featureCounts), column = 'SYMBOL', keytype = 'ENSEMBL', multiVals = 'first') 

annot <- as.data.frame(annot)

consensus <- data.frame('Symbol'=ifelse(!is.na(annot$SYMBOL), as.vector(annot$SYMBOL),
          ifelse(!is.na(annot$SYMBOL1),as.vector(annot$SYMBOL1), as.vector(annot$ENSEMBL))))

annot$consensus <- consensus

Thanks again.

ADD REPLY
1
Entering edit mode

miriam.riquelmep : @patelk26's comment to an answer. If it answer was helpfulthen you should mark it as accepted..
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

You're welcome, I'm glad it worked :)

ADD REPLY

Login before adding your answer.

Traffic: 2149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6