Quantifying abundances of transcripts using kallisto
1
0
Entering edit mode
11 months ago
Kumar ▴ 120

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female) and generated featureCount matrix. For the differential gene expression, I used deseq2 with adjusted 0.05 p-value.

On the other hand, I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that I am getting very less number of transcripts in differential expression after kallisto analysis in comparison to gene analysis results. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.

Here are the results:

Results: Deseq2 gene expression using STAR aligner

adjusted p-value < 0.05
LFC > 0 (up)       : 606, 1.2%
LFC < 0 (down)     : 169, 0.34%
outliers [1]       : 7, 0.014%
low counts [2]     : 40617, 81%


(mean count < 6)

Deseq2 result with kallisto:

out of 138 with nonzero total read count
LFC > 0 (up)       : 124, 90%
LFC < 0 (down)     : 14, 10%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)

rna-seq kallisto next-gen • 422 views
0
Entering edit mode
11 months ago

1) It is not clear to me that DESeq is intended for finding DE transcripts. Genes, sure, but not transcripts

2) Kallisto is smarter about assigning ambiguously assigned reads, I'd be using its gene counts, not FeatureCounts. Then you don't have to run two totally separate pipelines.

0
Entering edit mode

Hi, I am new in this analysis. I ran the kallisto and got all .tsv files then I merged these all files to make a single matrix file. It looks like this: Therefore, you mean I do not need to do differential expression analysis with this data, if so how I can use these data for RNA-seq analysis. Thank you!

#

                      270   515 574
ENST00000415118.1   0   0   0
ENST00000434970.2   0   0   0
ENST00000448914.1   0   0   0
ENST00000604642.1   0   0   0
ENST00000603326.1   0   0   0

0
Entering edit mode

You should use tximport first (or tximeta)

0
Entering edit mode

Yes! I am using tximport.

Here is my script: However, the issue here is that my samples number are not equal so I am not sure how to improve the script instead of using "each=93". Please suggest.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts) sampleTable <- data.frame(condition = factor(rep(c("mutant", "control"), each=93))) rownames(sampleTable) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

dds <- DESeq(dds)


Since I was not sure about the data structure, so I prepared a single .csv matrix file of all samples' count files using cut and paste commands and I am using the following script: Please suggest if this way is correct.

dat <- read.table("/outfile-kallisto.txt", header=TRUE)
dat <- as.matrix(dat)
dds <- DESeqDataSetFromMatrix(countData = dat,
design= ~ Group + Gender)

dds <- DESeq(dds)

0
Entering edit mode

You need to stop and figure out what that line with the "each=93" is doing, then you will know how to modify it for your own needs. Or, making a metadata table outside R is fine too.

Are you completely sure that txOUT = true is what you want to use with DESeq?

0
Entering edit mode

I figured out the updates. Here is my updated script: However, the genes in differential gene expression are very few numbers.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts) sampleMetaData <- data.frame(Group=c(rep(c("mutant"), 94), rep(c("control"), 93)), Gender=rep(c(rep(c("F"), 130), rep(c("M"), 57)))) rownames(sampleMetaData) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleMetaData, design= ~ Group + Gender)
res <- results(dds)
dds <- DESeq(dds)


results

out of 171018 with nonzero total read count
LFC > 0 (up)       : 975, 0.57%
LFC < 0 (down)     : 12, 0.007%
outliers [1]       : 0, 0%
low counts [2]     : 73970, 43%
(mean count < 2)