Question

RNA-seq analysis between 2 closely related strains of the same species

0

Entering edit mode

6.8 years ago

GiantSilverSoy ▴ 130

Hi, I am working on an RNA-seq analysis of a wild type and a gamma-irradiated mutant of a non-model organism. The aim is to identify differentially expressed genes between them. So, what I have done is generating 2 separate de novo assemblies, identifying common sequences between the 2 with at least 90% identity using cd-hit-est-2d, and using it as a mapping reference to do DE.

My question is whether my current workflow is fine to be continued or there is any generally accepted workflow to apply in my case? What do you think? Thanks.

RNA-Seq Assembly next-gen • 1.4k views

ADD COMMENT • link updated 6.8 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 6.8 years ago by GiantSilverSoy ▴ 130

score 3 · Answer 1 · 2017-07-05

3

Entering edit mode

6.8 years ago

Kristoffer Vitting-Seerup ★ 4.0k

I'm assuming you dont have a reference genome due to the non-model organisme comment (if you have you should take a different approach).

The other possible approach you could do would be to do a de-novo assembly based on the pooled data and then quantify that in each of your samples.

Both approaches have problems: The drawback of your solutions is that you rely on the % identical cutoff and you might get a 1:many or many:many relationships that are hard to untangle. The drawback of my suggestion is that you might assemble something that is not pressent in the actual samples.

I think i like mine a little better because you can be certain that it is the same transcript/gene that you quantify in both samles.

ADD COMMENT • link 6.8 years ago by Kristoffer Vitting-Seerup ★ 4.0k

1

Entering edit mode

Comparing expression when you map different samples to different assemblies is just too hard, so I'd second making a single coassembly. But an alternative approach to assembling all the reads together would be:

1) Assemble the control.
2) Map irradiated sample to control assembly, with fairly loose tolerance for mapping to account for polymorphisms.
3) Assemble unmapped reads.
4) Combine the assemblies and use that as your reference.

That might reduce the number of spurious assembled sequences that are just due to polymorphisms between the samples.

ADD REPLY • link 6.8 years ago by Brian Bushnell 20k