Entering edit mode
4.9 years ago
kbaitsi
•
0
I have a tsv file with 61 columns and 18703 lines (genes). Ι want to convert it in a appropriate dataframe in order to perform an Anova Analysis. The tsv file contains of 6 conditions (WT, TG, A, B, C, D). I have written the following code
f<-read.table(file = "GeneExpressionDataset_normalized.tsv", sep="\t", header=TRUE)
data.frame(Expression=as.numeric(f[1,2:61]), Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
for the first line but I am not sure how to loop this in order to get a dataframe for all the lines.
I tried
ff<-sapply(1:nrow(f),function(i){
x<-as.numeric(f[i,2:61])
data.frame(Expression=x, Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
})
and
a <- for (i in 1:nrow(f)){
data.frame(Expression=as.numeric(f[i,2:61]), Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
}
but it's not working. Any suggestions?
What is the final goal? Differential expression?
Yes, that's right...
Then why not using established, well-tested and specialised software such as
limma. Please go through its very extensive vignette. Other options for DE can be DESeq2 or edgeR but these strictly require the raw counts, you seem to have normalized counts, thereforelimma-trendpipeline could be an option.