How to change my colnames(count.table) as exactly similar to my rownames(meta)?
2
1
Entering edit mode
4.8 years ago
Farah ▴ 80

Hello,

My plan is to do differential expression analysis with DESeq2 package from the count data that I have obtained from Kallisto software (alignment-free tool). I have two samples. So, I did as below:

> datadir <- "/scratch/gh/"
> meta <- read.delim(paste0(datadir, "/se/se_meta.txt"), 
                     header = TRUE, as.is = TRUE)
> rownames(meta) <- meta$names
> meta$trt <- factor(meta$trt)
> meta$cell <- factor(meta$cell)
#
> library(tximport)
> files <- list.files("/scratch/gh/se/kallisto/", 
                pattern = ".*-abundance.tsv",
                full.names = TRUE)
> tx <- suppressMessages(tximport(files = files,
                                 type = "kallisto", 
                                 txOut = TRUE))
### The tx object contains the transcript expression extimates of my two samples.
> tx.counts <- round(tx$counts)
> colnames(tx.counts) <- sub("_.*","",basename(files))
#
### Next, I summarised them at the gene level:
> tx2gene <- data.frame(
    TX=rownames(tx.counts),
    GENEID=sub("\\.\\d+$","",rownames(tx.counts)))
#
> count.table <- round(summarizeToGene(tx, tx2gene)$counts)
> colnames(count.table) <- sub("_.*","",basename(files))
#
> class(count.table)
[1] "matrix"
#

Now, to be able to do differential expression analysis with DESeq2 package, I need to build a DESeqDataSet objest from my count matrix which is count.table. However, when I run the below code, I get the following ERROR:

#
> stopifnot(all(colnames(count.table) == rownames(meta)))
Error: all(colnames(count.table) == rownames(meta)) is not TRUE

I also ran the below codes to see what is different between my rownames(meta) and colnames(count.table):

> colnames(count.table)
[1] "SRR6822797-sortmerna-trimmomatic-abundance.tsv"
[2] "SRR6822798-sortmerna-trimmomatic-abundance.tsv"

> rownames(meta)
[1] "SRR6822797" "SRR6822798"

I think to fix this ERROR, I need to have my colnames(count.table) as exactly like my rownames(meta). Would you please help me how to make the output of my colnames(count.table) as "SRR6822797" "SRR6822798" ?

Thank you very much.

RNA-Seq Kallisto DESeq2 R tximport • 3.0k views
ADD COMMENT
3
Entering edit mode

Please try and solve your R problems before posting questions on a specialized website like Biostars.

You can generalize your problem like: "how to change rownames of a matrix" and google that.

Otherwise you can try and understand the code you posted, if you didn't write it yourself, as you already have the solution there.

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Did you try colnames(count.table) <- rownames(meta)?

ADD REPLY
1
Entering edit mode
4.7 years ago
ATpoint 82k

kallisto quantifies against a transcriptome but DESeq2 is intended for gene-level analysis. You have to aggregate transcriptome-level counts to the gene-level. The recommended approach is to use tximport (https://bioconductor.org/packages/release/bioc/html/tximport.html, manual is extensive, it has presets for kallisto so usage is trivial), and from the tximport output you can immediately make a DESeq2 object using DESeqDataSetFromTximport(). Please use tximport, and read the paper (http://dx.doi.org/10.12688/f1000research.7563.1) on what it does beyond gene-level aggregation and why this is important.

ADD COMMENT
0
Entering edit mode

Thank you so much. DESeqDataSetFromTximport() is exactly what I need. However, I need to run this function on my count.table file which is after linking transcript IDs to gene IDs for summarization (tx2gene file), and also after summarising the expression estimate at the gene level (by summarizeToGene function). count.table is a matrix and tx variable is a list. But, when I ran it for count.table, It generated the below error. While it works fine only for tx variable which is a list.

ds_Txi <- DESeqDataSetFromTximport(txi = count.table,
                                     colData = meta,
                                     design = ~ trt)
Error in txi$counts : $ operator is invalid for atomic vectors

Would you please let me know how is it possible to run DESeqDataSetFromTximport function for my count.table variable? Many thanks.

ADD REPLY
0
Entering edit mode

Please read the manual, txi is the output from tximport not the count table. Use tximport exactly following the manual for kallisto input and it will work.

ADD REPLY
0
Entering edit mode

Sure. Thanks a lot for your help.

ADD REPLY
0
Entering edit mode
4.7 years ago
zx8754 11k

Your column names have some extra information we want to drop, so it matches with rownames of meta:

# example, drop everything after first dash (-)
x <- c("SRR6822797-sortmerna-trimmomatic-abundance.tsv", "SRR6822798-sortmerna-trimmomatic-abundance.tsv") 
gsub("-.*", "", x)
# [1] "SRR6822797" "SRR6822798"

So in your case we would need something like:

colnames(count.table) <- gsub("-.*", "", colnames(count.table))
ADD COMMENT
0
Entering edit mode

Thank you very much for your useful advice.

ADD REPLY

Login before adding your answer.

Traffic: 3193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6