Sample number dependent unigene increase with De Novo assembly
0
0
Entering edit mode
2.2 years ago

I am analyzing eukaryotic RNA-seq results using the De Novo assembly.

I initially analyzed the sequencing reads from four samples of the same eukaryote species, and later added two samples of the same species (n=6 in total).

As a result, the number of Unigene increased to 1.5 times of the initial one.

It seems very unnatural.

Does anyone know anything about this phenomenon?

The software I used is as follows

(OS)LinuxMint 21.2 Cinnamon

Trinity v2.13.2

Salmon v1.4.0

Corset v1.09

Genome RNA-seq Assembly • 721 views
ADD COMMENT
1
Entering edit mode

Difficult to say if this is "unnatural". Depending on the information contributed by the two additional libraries getting more "Unigenes" may not be a bad thing. You could have also ended up with a lot more "junk" if the new libraries threw the assembler off-course.

You will need to try and annotate the results to see if they make sense or are nonsense.

ADD REPLY
1
Entering edit mode

I have found most de novo assemblers I've used are extremely noisy and it take a lot of polishing and merging overlapping annotations before you get a reasonable dataset to work with.

Also, without knowing numbers, it's hard to say if 1.5x is indeed an oddly large increase. Were the initial samples taken at the same time from the same population and treatment? Were the latter samples from a different population or treatment, these could easily explain a large difference in captured expression profiles. I've seen this exact phenomenon occur during a developmental time series dataset.

ADD REPLY

Login before adding your answer.

Traffic: 3122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6