Question: Differential Expression in DEseq2 to GSEA (9 samples, 3 conditions)
0
gravatar for bryce.kirby
13 months ago by
bryce.kirby0 wrote:

Hello,

I am working with RNA-seq data and trying to implement my stringtie output file from "prepDE.py" for all 9 of my samples into DESeq2 to perform differential Expression on my three conditions here is how my data is set up:

cell line 1:
sample1 (control)
sample2 (knockdown)
sample3 (overexpression)

cell line 2: 
sample4 (control)
sample5 (knockdown)
sample6 (overexpression)

cell line 3:
sample7 (control)
sample8 (knockdown)
sample9 (overexpression)

I have a generated "transcript_count_matrix.csv" file from prepDE.py and a merged_transcripts.gtf file from stringtie --merge for all 9 samples with FPKM values/ensembl IDs.

I also have the output for each sample from stringtie -e -B:

sample1.gtf  e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  t_data.ctab

I would like to know how can I perform Differential expression with this output from stringtie with DESeq2? I would like to compare all 3 control vs. all 3 knockdown/overexpression expression levels and have this in a format that I can use to input as a .gct file for Gene Set Enrichment Analysis.

Much like how cuffdiff works and outputs fpkm_tracking files with gene symbols and fpkm values. I would like something similar with this pipeline.

Any suggestions on how to proceed and any help would be greatly appreciated!!

Thanks so much,

Bryce

ADD COMMENTlink modified 13 months ago by Kevin Blighe41k • written 13 months ago by bryce.kirby0
3
gravatar for Kevin Blighe
13 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

If you want to use DESeq2 for differential expression analysis, then you should start from the raw counts stage, not FPKM values. For double confirmation on this, take the words of Gordon: Question: Differential expression of RNA-seq data using limma and voom()

In your situation, I can understand why you were using StringTie. I would do the following:

  1. Start with your merged_transcripts.gtf and raw FASTQ files (or aligned BAMs)
  2. Determine raw read count abundances over your GTF with Kallisto or Salmon (from FASTQs), featureCounts or BEDTools (BAMs), or something else
  3. Input the raw counts into DESeq2 and conduct differential expression analysis

Kevin

ADD COMMENTlink modified 12 months ago • written 13 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1861 users visited in the last hour