Question: Proper design for DESeq2 and other RNAseq general questions
0
gravatar for Stef
12 weeks ago by
Stef0
Stef0 wrote:

Hello, I am new to the RNAseq. I sequenced 20 libraries from Eucalyptus. They are all different tissues: early flower, late flower pollinated, late flower unpollinated, early seed capsule, late seed capsule, mature pollen, and mature leaf. I have three biological replicates for all the tissues except pollen; I only have two for pollen.

I started my analysis using Hisat2, stringtie, and ballgown. But, I have not figured out how to do multiple pairwise comparisons in ballgown yet. Any suggestions?

Now, I have moved to using DESeq2 instead of ballgown. However, I am not sure how to design the formula for DESeq2 since I only have two replicates for pollen. The design: tissue*tree gets an error because "the model matrix is not full rank". I read on a blog post to combine tree and tissue into one. However, that makes it seems like there are no replicates. I am not sure how to go forward without replicates.

Also, I am concerned that my counts are not normalized well-enough by calculating FPKM values. My reads are single-ended. Is it possible to generate RPKM values using ballgown or DESeq2?

Last, is STAR considered a better aligner than Hisat2?

Thanks! Stef

hisat2 rna-seq deseq2 stringtie • 220 views
ADD COMMENTlink modified 12 weeks ago by swbarnes24.6k • written 12 weeks ago by Stef0

Stringtie has a prepDY.py script that takes .gtf and converts into reads. I am not entirely sure of the algorithm used, but I use those counts for DESeq2 analysis.

ADD REPLYlink written 12 weeks ago by piyushjo80

Could you paste your Design matrix ? From your post, it is unclear whether the samples came from the 3 (2) same trees or from 20 different trees. If they come from different trees, then, as h.mon suggested, you should only consider the "tissue" factor in your formula.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Carlo Yague4.3k

So they come from the same trees (tree one, tree two, and tree three). I apologize about not being clear:

id tree tissue age type group

eg1MT1 one early_capsule early flowering one.early_capsule

eg1MT2 two early_capsule early flowering two.early_capsule

eg1MT3 three early_capsule early flowering three.early_capsule

eg1WT1 one mature_flower mature flowering one.mature_flower

eg1WT2 two mature_flower mature flowering two.mature_flower

eg1WT3 three mature_flower mature flowering three.mature_flower

eg3MT1 one mature_capsule mature flowering one.mature_capsule

eg3MT2 two mature_capsule mature flowering two.mature_capsule

eg3MT3 three mature_capsule mature flowering three.mature_capsule

egAT1 one early_flower_pol early flowering one.early_flower_pol

egAT2 two early_flower_pol early flowering two.early_flower_pol

egAT3 three early_flower_pol early flowering three.early_flower_pol

egL1 one mature_leaf mature vegetative one.mature_leaf

egL2 two mature_leaf mature vegetative two.mature_leaf

egL3 three mature_leaf mature vegetative three.mature_leaf

egNPT1 one early_flower_unpol early flowering one.early_flower_unpol

egNPT2 two early_flower_unpol early flowering two.early_flower_unpol

egNPT3 three early_flower_unpol early flowering three.early_flower_unpol

egP1 one mature_pollen mature flowering one.mature_pollen

egP2 two mature_pollen mature flowering two.mature_pollen

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Stef0

Ok, now I understand why tissue*tree is not working. tissue*tree is equivalent to tissue + tree + tissue:tree (interaction). However, to be able to compute the interaction term, you would need at least two exact replicate of each (tissue-tree) couple. With your data, the best you can do is use a tissue + tree model, without the interaction term.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Carlo Yague4.3k
1
gravatar for h.mon
12 weeks ago by
h.mon22k
Brazil
h.mon22k wrote:

From the description of your experiment, I think your design should be only tissue, why are you including tissue*tree? Having two replicates for pollen is not ideal, but shouldn't cause errors.

Also, I am concerned that my counts are not normalized well-enough by calculating FPKM values. My reads are single-ended. Is it possible to generate RPKM values using ballgown or DESeq2?

Your counts won't be properly normalized with both FPKM and RPKM (and, for single-ends reads, RPKM is the same as FPKM). A better within sample normalization is TPM, which ballgown calculates. For DESeq2, you don't need these normalizations, it expect raw counts as input.

Last, is STAR considered a better aligner than Hisat2?

I think so, but not better enough to warrant realigning your reads

ADD COMMENTlink written 12 weeks ago by h.mon22k

Ok. I will stick to DESeq2 then because I am not sure if ballgown does RPKM. I think they only normalize with FPKM or gene coverage.

ADD REPLYlink written 12 weeks ago by Stef0
1
gravatar for swbarnes2
12 weeks ago by
swbarnes24.6k
United States
swbarnes24.6k wrote:

20 libraries So that's 7 trees, six tissues from each, with one of the pollen samples dropping out for some reason, right?

However, I am not sure how to design the formula for DESeq2 since I only have two replicates for pollen.

That's not the problem. Your design is just "tissue". It's all you can do. You can't model differences between trees with only a single tissue sample per tree. If you'd taken 4 flowers of each type from each tree, then you could.

And pretty much no one uses FPKM anymore. DESeq2 takes raw gene counts. It will do its own normalization.

ADD COMMENTlink written 12 weeks ago by swbarnes24.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1570 users visited in the last hour