Question

Differential Expression in DEseq2 to GSEA (9 samples, 3 conditions)

0

Entering edit mode

6.1 years ago

bryce.kirby • 0

Hello,

I am working with RNA-seq data and trying to implement my stringtie output file from "prepDE.py" for all 9 of my samples into DESeq2 to perform differential Expression on my three conditions here is how my data is set up:

cell line 1:
sample1 (control)
sample2 (knockdown)
sample3 (overexpression)

cell line 2: 
sample4 (control)
sample5 (knockdown)
sample6 (overexpression)

cell line 3:
sample7 (control)
sample8 (knockdown)
sample9 (overexpression)

I have a generated "transcript_count_matrix.csv" file from prepDE.py and a merged_transcripts.gtf file from stringtie --merge for all 9 samples with FPKM values/ensembl IDs.

I also have the output for each sample from stringtie -e -B:

sample1.gtf  e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  t_data.ctab

I would like to know how can I perform Differential expression with this output from stringtie with DESeq2? I would like to compare all 3 control vs. all 3 knockdown/overexpression expression levels and have this in a format that I can use to input as a .gct file for Gene Set Enrichment Analysis.

Much like how cuffdiff works and outputs fpkm_tracking files with gene symbols and fpkm values. I would like something similar with this pipeline.

Any suggestions on how to proceed and any help would be greatly appreciated!!

Thanks so much,

Bryce

RNA-Seq stringtie deseq2 differential expression • 2.4k views

ADD COMMENT • link updated 6.1 years ago by Kevin Blighe 87k • written 6.1 years ago by bryce.kirby • 0

score 3 · Accepted Answer · 2018-03-31

If you want to use DESeq2 for differential expression analysis, then you should start from the raw counts stage, not FPKM values. For double confirmation on this, take the words of Gordon: Question: Differential expression of RNA-seq data using limma and voom()

In your situation, I can understand why you were using StringTie. I would do the following:

Start with your merged_transcripts.gtf and raw FASTQ files (or aligned BAMs)
Determine raw read count abundances over your GTF with Kallisto or Salmon (from FASTQs), featureCounts or BEDTools (BAMs), or something else
Input the raw counts into DESeq2 and conduct differential expression analysis

Kevin