4
0
Entering edit mode
5 months ago
Gregor Rot ▴ 490

Dear all,

From the DESeq2 guide (http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#countmat) I try to read a count matrix in from a TAB separated file:

cts <- as.matrix(read.csv("data.tab", sep="\t", row.names="gene_name"))


However I see that as.matrix causes the values to be "strings" in R?

head(cts, 3)
entrez_id CON6_A    CON6_B    CON6_C    AMD6_A    AMD6_B
0610005C13Rik "71661"   "      0" "      0" "     13" "      0" "      0"
0610007P14Rik "58520"   "   1224" "   2002" "   1931" "   1830" "   1155"
0610009B22Rik "66050"   "     67" "     59" "     41" "     27" "     54


If I don't convert to matrix with:

cts <- read.csv("data.tab", sep="\t", row.names="gene_name")


All looks good:

entrez_id CON6_A CON6_B CON6_C AMD6_A AMD6_B AMD6_C
0610005C13Rik     71661      0      0     13      0      0     16
0610007P14Rik     58520   1224   2002   1931   1830   1155   1596
0610009B22Rik     66050     67     59     41     27     54     23


Is casting to as.matrix a must for DESeq2? Could someone advise?

Thanks, Gregor

deseq2 read.csv as.matrix • 593 views
1
Entering edit mode
5 months ago
ATpoint 55k

DESeq2 can also take a data.frame but it must be all-numeric so you would need to move the gene column to rownames:

rownames(cts) <- cts[,1]
cts[,1] <- NULL
dds <- DESeqDataSetFrommatrix(countData=cts, ...)


You see your matrix becoming type character after as.matrix (check with typeof(cts)) because the gene column is of type character and a matrix in R can only hold one type of data whereas a data.frame could hold multiple types. Since character is more flexible that numeric (aka double) the matrix gets coerced to character. See for more info on all that: http://adv-r.had.co.nz/Data-structures.html

You also want read.csv(..., header=TRUE) to move the csv header to data.frame/matrix colnames.

0
Entering edit mode

Thanks so much, will check it out!

0
Entering edit mode
5 months ago

Your count matrix might have quotation marks in it, values might be getting imported as strings and converted to factors:

> x <- data.frame(A=factor(c(0,0,0,0,1,3,3,4)), B=factor(c(0,1,0,1,2,3,4,5)))
> x <- data.frame(A=factor(c(0,0,0,0,1,3,3,4)), B=factor(c(0,1,0,1,2,3,4,5)))
> print(x)
A B
1 0 0
2 0 1
3 0 0
4 0 1
5 1 2
6 3 3
7 3 4
8 4 5
> as.matrix(x)
A   B
[1,] "0" "0"
[2,] "0" "1"
[3,] "0" "0"
[4,] "0" "1"
[5,] "1" "2"
[6,] "3" "3"
[7,] "3" "4"
[8,] "4" "5"


Check your data file for quotation marks around the numbers. It could be something to do with how the file is being imported. Try "read.table" and specify the quoting character (quote="\"") .

0
Entering edit mode

Thank you for your answer. My file is without any quotes and is very simple:

gene_name       entrez_id       CON6_A  CON6_B  CON6_C  AMD6_A  AMD6_B  AMD6_C
0610005C13Rik   71661   0       0       13      0       0       16
0610007P14Rik   58520   1224    2002    1931    1830    1155    1596
0610009B22Rik   66050   67      59      41      27      54      23
0610009L18Rik   66838   55      34      44      18      13      49
0610009O20Rik   66839   241     212     257     432     188     239


What would be the right way to read this data in? Thanks

0
Entering edit mode

That's interesting, maybe there is some other reason that these values are being converted to factors? can you check the results of class(cts\$CON6_A) and post?

Load the data in as you have been, without casting to a matrix.

0
Entering edit mode
5 months ago
as.matrix(read.csv("data.tab", sep="\t", row.names="gene_name"))


You've got a bit of a contradiction here, read.csv assumes the divider is comma, but you've specified tab. If it's really a csv, drop the sep = "\t"

0
Entering edit mode

read.csv can read tab-delimited files fine, when the sep parameter is tab (the OP said that the file is tab-delimited).

ttt <- matrix(runif(100),nrow=10)
write.table(ttt, "ttt.tsv", sep="\t")
check_res[1:4,1:4]

V1 V2 V3 V4
1  1 11 21 31
2  2 12 22 32
3  3 13 23 33
4  4 14 24 34

0
Entering edit mode
5 months ago

Hi Gregor!

It is not necesary that your count matrix has a class of matrix to use DESeq2. In my earlier analysis I was using DESeq2 and after importing my data into R (using tximport package) a list was created. My raw counts matrix was stored in such list with a class of matrix. I re-coded the count matrix as a data frame since I feel more comfortable working with df instead of matrices. After running DESeqDataSetFromMatrix function everything ran without warnings.

Rodo