Entering edit mode

10 months ago

kbaitsi
•
0

I have a tsv file with 61 columns and 18703 lines (genes). Ι want to convert it in a appropriate dataframe in order to perform an Anova Analysis. The tsv file contains of 6 conditions (WT, TG, A, B, C, D). I have written the following code

```
f<-read.table(file = "GeneExpressionDataset_normalized.tsv", sep="\t", header=TRUE)
data.frame(Expression=as.numeric(f[1,2:61]), Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
```

for the first line but I am not sure how to loop this in order to get a dataframe for all the lines.

I tried

```
ff<-sapply(1:nrow(f),function(i){
x<-as.numeric(f[i,2:61])
data.frame(Expression=x, Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
})
```

and

```
a <- for (i in 1:nrow(f)){
data.frame(Expression=as.numeric(f[i,2:61]), Condition = c(rep("WT", 10), rep("TG", 10), rep("A", 10), rep("B",10), rep("C", 10), rep("D", 10)))
}
```

but it's not working. Any suggestions?

What is the final goal? Differential expression?

Yes, that's right...

Then why not using established, well-tested and specialised software such as

`limma`

. Please go through its very extensive vignette. Other options for DE can be DESeq2 or edgeR but these strictly require the raw counts, you seem to have normalized counts, therefore`limma-trend`

pipeline could be an option.