Question

Sample number dependent unigene increase with De Novo assembly

0

Entering edit mode

2.2 years ago

yuki_okadai_yy • 0

I am analyzing eukaryotic RNA-seq results using the De Novo assembly.

I initially analyzed the sequencing reads from four samples of the same eukaryote species, and later added two samples of the same species (n=6 in total).

As a result, the number of Unigene increased to 1.5 times of the initial one.

It seems very unnatural.

Does anyone know anything about this phenomenon?

The software I used is as follows

(OS)LinuxMint 21.2 Cinnamon

Trinity v2.13.2

Salmon v1.4.0

Corset v1.09

Genome RNA-seq Assembly • 721 views

ADD COMMENT • link updated 2.2 years ago by dthorbur ★ 3.1k • written 2.2 years ago by yuki_okadai_yy • 0

1

Entering edit mode

Difficult to say if this is "unnatural". Depending on the information contributed by the two additional libraries getting more "Unigenes" may not be a bad thing. You could have also ended up with a lot more "junk" if the new libraries threw the assembler off-course.

You will need to try and annotate the results to see if they make sense or are nonsense.

ADD REPLY • link 2.2 years ago by GenoMax 154k

1

Entering edit mode

I have found most de novo assemblers I've used are extremely noisy and it take a lot of polishing and merging overlapping annotations before you get a reasonable dataset to work with.

Also, without knowing numbers, it's hard to say if 1.5x is indeed an oddly large increase. Were the initial samples taken at the same time from the same population and treatment? Were the latter samples from a different population or treatment, these could easily explain a large difference in captured expression profiles. I've seen this exact phenomenon occur during a developmental time series dataset.

ADD REPLY • link 2.2 years ago by dthorbur ★ 3.1k