Entering edit mode
2.2 years ago
yuki_okadai_yy
•
0
I am analyzing eukaryotic RNA-seq results using the De Novo assembly.
I initially analyzed the sequencing reads from four samples of the same eukaryote species, and later added two samples of the same species (n=6 in total).
As a result, the number of Unigene increased to 1.5 times of the initial one.
It seems very unnatural.
Does anyone know anything about this phenomenon?
The software I used is as follows
(OS)LinuxMint 21.2 Cinnamon
Trinity v2.13.2
Salmon v1.4.0
Corset v1.09
Difficult to say if this is "unnatural". Depending on the information contributed by the two additional libraries getting more "Unigenes" may not be a bad thing. You could have also ended up with a lot more "junk" if the new libraries threw the assembler off-course.
You will need to try and annotate the results to see if they make sense or are nonsense.
I have found most de novo assemblers I've used are extremely noisy and it take a lot of polishing and merging overlapping annotations before you get a reasonable dataset to work with.
Also, without knowing numbers, it's hard to say if 1.5x is indeed an oddly large increase. Were the initial samples taken at the same time from the same population and treatment? Were the latter samples from a different population or treatment, these could easily explain a large difference in captured expression profiles. I've seen this exact phenomenon occur during a developmental time series dataset.