Stringtie on non model organism - how to best proceed for DE analysis?
0
2
Entering edit mode
10 weeks ago
Shred ▴ 440

I've done denovo transcriptome assembly using Stringtie with paired end RNA-seq reads coming from a mid/large size experiment (60+ samples) conducted on rabbit: I've followed developers' protocol, so alignment+assembly+merging. Now, from the GTF, I thought about two possible ways:

  1. Quantification via Stringtie using the denovo merged GTF file: then using authors' script, convert samples quantification into a single gene-level matrix to use with DESeq2.
  2. Using the merged GTF, building a transcriptome using gffread and then do alignment/mapping against the transcriptome: import counts and analyze with DESeq2

I want to ask if both seems reasonable and, if not, which is better. I've got some trouble when trying to implement them:

With the first approach: the same gene was present multiple times while doing DESeq2::results

> n_occur <- data.frame(table(rownames(countData)))
> dim(n_occur)  
[1] 39722     2
>n_occur[n_occur$Freq > 1,]
[1] Var1 Freq
<0 rows> (or 0-length row.names)

So no duplicates seems to be present.

dds <- DESeqDataSetFromMatrix(countData, colData, design=~0 + group)
res <- results(dds, contrast = c("group", "D31", "NT0"), alpha = 0.05, 
               pAdjustMethod = "BH")
# lfc shrinkage to have best results. 
res_lfcsh <- lfcShrink(dds, contrast = c('group', 'D31', 'NTO'), res = res, type = "ashr")

Visualizing results

"","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj"
"ENSOCUG00000034460",4.82720066318387,4.93460839115356,1.98406628126073,2.48711872065986,0.0128782414726579,NA
"ENSOCUG00000034460.1",4.82720066318387,4.93460839115356,1.98406628126073,2.48711872065986,0.0128782414726579,NA
"ENSOCUG00000034460.2",4.82720066318387,4.93460839115356,1.98406628126073,2.48711872065986,0.0128782414726579,NA

These "genes", like ENSOCUG00000034460.1, are duplications and are not present in original gene count matrix obtained via prepDE. Where does they come from? They appear many times (200-300 and more).

With the second: is it possible to create a linkedTxome to use with Salmon with the denovo GTF and the transcriptome generated via gffread? Is it necessary to build a TxDb to analyze Salmon data?

Thanks and sorry for the multiple questions.

Salmon Stringtie DESeq transcriptome rna-seq • 138 views
ADD COMMENT

Login before adding your answer.

Traffic: 790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6