Question: Analysis of GEO dataset normalized by FPKM
0
gravatar for GSAENZDEPIP
10 months ago by
GSAENZDEPIP10
GSAENZDEPIP10 wrote:

Good morning,

I'd like to perform differential expression analysis with some RNA-seq samples from GEO database (GSE99987) and obtain significant genes between different conditions. However, the count tables that are available on GEO show FPKM normalized counts. This normalization was done by Cuffdiff (v2.2.1) as it is mentioned by the authors.

So my question is: Should I use FPKM-normalized counts for differential expression analysis without applying any other normalization (such as TMM, DESeq size factor...) ??

P.D: I am confused because I've always read that FPKM normalization was for comparison of gene counts within the same sample. Whereas TMM, DESeq... normalizations were for comparison of gene counts between different conditions (samples).

Thank you in advance, Goren

rna-seq • 691 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by GSAENZDEPIP10
2
gravatar for ATpoint
10 months ago by
ATpoint19k
Germany
ATpoint19k wrote:

FPKM is considered inferior to other normalization methods. If you want to use tools like DESeq2 or edgeR, you'll need raw counts. Probably you have to download the data and quantify them yourself. I suggest you use a tool like Salmon or Kallisto for transcript level quantification, then tximport to aggregate counts to the gene level, followed by differential analysis with DESeq2 or a similar framework. You can get the raw data from the ENA, following my tutorial.

ADD COMMENTlink written 10 months ago by ATpoint19k

Okey, I will do it from raw data. I didn't know that you could download RNAseq experiments from ENA... Thank you!!

ADD REPLYlink written 10 months ago by GSAENZDEPIP10

One last question. The raw data of these project has 3-4 runs per sample... how should I deal with it? I have always worked with one .fastq file per sample. Is there any tutorial for this situation?

Thanks!

ADD REPLYlink written 10 months ago by GSAENZDEPIP10

In the simplest case, you can combine them prior to quantification with cat in1.fq.gz in2.fq.gz (...) > in_comb.fq.gz and then proceed as usual. If these are technical, so sequencing replicates from different lanes, you will probably be fine. Alternatively, you can process them independently, and then do a principal component analysis to see if the lane replicates cluster together. This would be a quality check. There is a section in the DESeq2 manual about PCA and its input requirements (variance-stabilized counts). If this looks ok, you could simply sum up the counts per replicates.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1660 users visited in the last hour