Question: How to change my colnames(count.table) as exactly similar to my rownames(meta)?
1
gravatar for F. Golestan
3 months ago by
F. Golestan20
F. Golestan20 wrote:

Hello,

My plan is to do differential expression analysis with DESeq2 package from the count data that I have obtained from Kallisto software (alignment-free tool). I have two samples. So, I did as below:

> datadir <- "/scratch/gh/"
> meta <- read.delim(paste0(datadir, "/se/se_meta.txt"), 
                     header = TRUE, as.is = TRUE)
> rownames(meta) <- meta$names
> meta$trt <- factor(meta$trt)
> meta$cell <- factor(meta$cell)
#
> library(tximport)
> files <- list.files("/scratch/gh/se/kallisto/", 
                pattern = ".*-abundance.tsv",
                full.names = TRUE)
> tx <- suppressMessages(tximport(files = files,
                                 type = "kallisto", 
                                 txOut = TRUE))
### The tx object contains the transcript expression extimates of my two samples.
> tx.counts <- round(tx$counts)
> colnames(tx.counts) <- sub("_.*","",basename(files))
#
### Next, I summarised them at the gene level:
> tx2gene <- data.frame(
    TX=rownames(tx.counts),
    GENEID=sub("\\.\\d+$","",rownames(tx.counts)))
#
> count.table <- round(summarizeToGene(tx, tx2gene)$counts)
> colnames(count.table) <- sub("_.*","",basename(files))
#
> class(count.table)
[1] "matrix"
#

Now, to be able to do differential expression analysis with DESeq2 package, I need to build a DESeqDataSet objest from my count matrix which is count.table. However, when I run the below code, I get the following ERROR:

#
> stopifnot(all(colnames(count.table) == rownames(meta)))
Error: all(colnames(count.table) == rownames(meta)) is not TRUE

I also ran the below codes to see what is different between my rownames(meta) and colnames(count.table):

> colnames(count.table)
[1] "SRR6822797-sortmerna-trimmomatic-abundance.tsv"
[2] "SRR6822798-sortmerna-trimmomatic-abundance.tsv"

> rownames(meta)
[1] "SRR6822797" "SRR6822798"

I think to fix this ERROR, I need to have my colnames(count.table) as exactly like my rownames(meta). Would you please help me how to make the output of my colnames(count.table) as "SRR6822797" "SRR6822798" ?

Thank you very much.

rna-seq kallisto deseq2 R tximport • 218 views
ADD COMMENTlink modified 3 months ago by ATpoint25k • written 3 months ago by F. Golestan20
3

Please try and solve your R problems before posting questions on a specialized website like Biostars.

You can generalize your problem like: "how to change rownames of a matrix" and google that.

Otherwise you can try and understand the code you posted, if you didn't write it yourself, as you already have the solution there.

ADD REPLYlink modified 3 months ago • written 3 months ago by Martombo2.6k

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 3 months ago by genomax74k

Did you try colnames(count.table) <- rownames(meta)?

ADD REPLYlink written 3 months ago by h.mon28k
1
gravatar for ATpoint
3 months ago by
ATpoint25k
Germany
ATpoint25k wrote:

kallisto quantifies against a transcriptome but DESeq2 is intended for gene-level analysis. You have to aggregate transcriptome-level counts to the gene-level. The recommended approach is to use tximport (https://bioconductor.org/packages/release/bioc/html/tximport.html, manual is extensive, it has presets for kallisto so usage is trivial), and from the tximport output you can immediately make a DESeq2 object using DESeqDataSetFromTximport(). Please use tximport, and read the paper (http://dx.doi.org/10.12688/f1000research.7563.1) on what it does beyond gene-level aggregation and why this is important.

ADD COMMENTlink written 3 months ago by ATpoint25k

Thank you so much. DESeqDataSetFromTximport() is exactly what I need. However, I need to run this function on my count.table file which is after linking transcript IDs to gene IDs for summarization (tx2gene file), and also after summarising the expression estimate at the gene level (by summarizeToGene function). count.table is a matrix and tx variable is a list. But, when I ran it for count.table, It generated the below error. While it works fine only for tx variable which is a list.

ds_Txi <- DESeqDataSetFromTximport(txi = count.table,
                                     colData = meta,
                                     design = ~ trt)
Error in txi$counts : $ operator is invalid for atomic vectors

Would you please let me know how is it possible to run DESeqDataSetFromTximport function for my count.table variable? Many thanks.

ADD REPLYlink written 3 months ago by F. Golestan20

Please read the manual, txi is the output from tximport not the count table. Use tximport exactly following the manual for kallisto input and it will work.

ADD REPLYlink written 3 months ago by ATpoint25k

Sure. Thanks a lot for your help.

ADD REPLYlink written 3 months ago by F. Golestan20
0
gravatar for zx8754
3 months ago by
zx87548.4k
London
zx87548.4k wrote:

Your column names have some extra information we want to drop, so it matches with rownames of meta:

# example, drop everything after first dash (-)
x <- c("SRR6822797-sortmerna-trimmomatic-abundance.tsv", "SRR6822798-sortmerna-trimmomatic-abundance.tsv") 
gsub("-.*", "", x)
# [1] "SRR6822797" "SRR6822798"

So in your case we would need something like:

colnames(count.table) <- gsub("-.*", "", colnames(count.table))
ADD COMMENTlink written 3 months ago by zx87548.4k

Thank you very much for your useful advice.

ADD REPLYlink written 3 months ago by F. Golestan20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1711 users visited in the last hour