Question: Quantifying abundances of transcripts using kallisto
0
gravatar for Manoj
14 days ago by
Manoj80
United States
Manoj80 wrote:

Hi, I am analyzing RNA-Seq data. I have 190 PE samples (control: male, female, mutant: male, female) and generated featureCount matrix. For the differential gene expression, I used deseq2 with adjusted 0.05 p-value.

On the other hand, I used kallisto for quantifying abundances of transcripts with the same input data.

My question is that I am getting very less number of transcripts in differential expression after kallisto analysis in comparison to gene analysis results. However, the number of DE should be more in abundances of transcripts rather than a gene-based method.

Please give your opinion:

Here are the results:

Results: Deseq2 gene expression using STAR aligner

adjusted p-value < 0.05
LFC > 0 (up)       : 606, 1.2%
LFC < 0 (down)     : 169, 0.34%
outliers [1]       : 7, 0.014%
low counts [2]     : 40617, 81%

(mean count < 6)

Deseq2 result with kallisto:

out of 138 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 124, 90%
LFC < 0 (down)     : 14, 10%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
rna-seq kallisto next-gen • 110 views
ADD COMMENTlink modified 14 days ago by swbarnes29.2k • written 14 days ago by Manoj80
0
gravatar for swbarnes2
14 days ago by
swbarnes29.2k
United States
swbarnes29.2k wrote:

1) It is not clear to me that DESeq is intended for finding DE transcripts. Genes, sure, but not transcripts

2) Kallisto is smarter about assigning ambiguously assigned reads, I'd be using its gene counts, not FeatureCounts. Then you don't have to run two totally separate pipelines.

ADD COMMENTlink modified 14 days ago • written 14 days ago by swbarnes29.2k

Hi, I am new in this analysis. I ran the kallisto and got all .tsv files then I merged these all files to make a single matrix file. It looks like this: Therefore, you mean I do not need to do differential expression analysis with this data, if so how I can use these data for RNA-seq analysis. Thank you!

#

                      270   515 574
    ENST00000415118.1   0   0   0
    ENST00000434970.2   0   0   0
    ENST00000448914.1   0   0   0
    ENST00000604642.1   0   0   0
    ENST00000603326.1   0   0   0
ADD REPLYlink modified 14 days ago • written 14 days ago by Manoj80

You should use tximport first (or tximeta)

ADD REPLYlink written 14 days ago by eridanus10

Yes! I am using tximport.

Here is my script: However, the issue here is that my samples number are not equal so I am not sure how to improve the script instead of using "each=93". Please suggest.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleTable <- data.frame(condition = factor(rep(c("mutant", "control"), each=93)))
rownames(sampleTable) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleTable, ~condition)

dds <- DESeq(dds)

Since I was not sure about the data structure, so I prepared a single .csv matrix file of all samples' count files using cut and paste commands and I am using the following script: Please suggest if this way is correct.

dat <- read.table("/outfile-kallisto.txt", header=TRUE)
dat <- as.matrix(dat)
head(dat)
metadata <- read.table("/metadata-kallisto.txt", header=TRUE)
dds <- DESeqDataSetFromMatrix(countData = dat,
                          colData = metadata,
                          design= ~ Group + Gender)

dds <- DESeq(dds)
ADD REPLYlink modified 11 days ago • written 11 days ago by Manoj80

You need to stop and figure out what that line with the "each=93" is doing, then you will know how to modify it for your own needs. Or, making a metadata table outside R is fine too.

Are you completely sure that txOUT = true is what you want to use with DESeq?

ADD REPLYlink written 11 days ago by swbarnes29.2k

I figured out the updates. Here is my updated script: However, the genes in differential gene expression are very few numbers.

names(files) <- paste0("sample", 1:187)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
head(txi.kallisto$counts)
sampleMetaData <- data.frame(Group=c(rep(c("mutant"), 94), rep(c("control"), 93)), Gender=rep(c(rep(c("F"), 
130), rep(c("M"), 57))))

rownames(sampleMetaData) <- colnames(txi.kallisto$counts)
dds <- DESeqDataSetFromTximport(txi.kallisto, sampleMetaData, design= ~ Group + Gender)
res <- results(dds)
dds <- DESeq(dds)

results

out of 171018 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 975, 0.57%
LFC < 0 (down)     : 12, 0.007%
outliers [1]       : 0, 0%
low counts [2]     : 73970, 43%
(mean count < 2)
ADD REPLYlink modified 11 days ago • written 11 days ago by Manoj80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1288 users visited in the last hour