Question: Differential Expression in DEseq2 to GSEA (9 samples, 3 conditions)
gravatar for bryce.kirby
2.4 years ago by
bryce.kirby0 wrote:


I am working with RNA-seq data and trying to implement my stringtie output file from "" for all 9 of my samples into DESeq2 to perform differential Expression on my three conditions here is how my data is set up:

cell line 1:
sample1 (control)
sample2 (knockdown)
sample3 (overexpression)

cell line 2: 
sample4 (control)
sample5 (knockdown)
sample6 (overexpression)

cell line 3:
sample7 (control)
sample8 (knockdown)
sample9 (overexpression)

I have a generated "transcript_count_matrix.csv" file from and a merged_transcripts.gtf file from stringtie --merge for all 9 samples with FPKM values/ensembl IDs.

I also have the output for each sample from stringtie -e -B:

sample1.gtf  e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  t_data.ctab

I would like to know how can I perform Differential expression with this output from stringtie with DESeq2? I would like to compare all 3 control vs. all 3 knockdown/overexpression expression levels and have this in a format that I can use to input as a .gct file for Gene Set Enrichment Analysis.

Much like how cuffdiff works and outputs fpkm_tracking files with gene symbols and fpkm values. I would like something similar with this pipeline.

Any suggestions on how to proceed and any help would be greatly appreciated!!

Thanks so much,


ADD COMMENTlink modified 2.4 years ago by Kevin Blighe63k • written 2.4 years ago by bryce.kirby0
gravatar for Kevin Blighe
2.4 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

If you want to use DESeq2 for differential expression analysis, then you should start from the raw counts stage, not FPKM values. For double confirmation on this, take the words of Gordon: Question: Differential expression of RNA-seq data using limma and voom()

In your situation, I can understand why you were using StringTie. I would do the following:

  1. Start with your merged_transcripts.gtf and raw FASTQ files (or aligned BAMs)
  2. Determine raw read count abundances over your GTF with Kallisto or Salmon (from FASTQs), featureCounts or BEDTools (BAMs), or something else
  3. Input the raw counts into DESeq2 and conduct differential expression analysis


ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour