Re-assembling old RNA-seq reads vs. using an existing transcriptome assembly
2
0
Entering edit mode
7 hours ago
Mostafa • 0

Hello everyone, I am working on a non-model plant species whose RNA-seq data were generated a few years ago by another group, and I have access to both the raw reads and the previously assembled transcriptome submitted to TSA. I now want to mine ORFs for downstream analyses, and I am trying to determine whether it is better to rely on the existing assembly or to generate a new one from the raw data, especially considering that current assembly pipelines have improved since the original work. I would also like to know if there are any reasons, beyond the availability of updated tools, that would justify creating a new assembly. I have no computational limitations and would appreciate any guidance or experience that can help inform this decision. Thank you.

non-model-organism de-novo-assembly RNA-seq • 130 views
ADD COMMENT
1
Entering edit mode
5 hours ago
dthorbur ★ 3.2k

There are a few factors to consider:

The quality of the existing reference.
If there is a high quality chromosome level assembly, you would need to be skilled and have good data to replicate this quality. And this is no easy feat even with modern tools, especially if your plant has a complex genome (e.g., high levels of ploidy). If the existing assembly is highly fragmented, creating a similar quality one is considerably less work. I realise this will be a transcriptome assembly, but in my experience transcriptomes derived from genome assemblies are better - better ability to discern between isoforms and duplications, for example.

Evolutionary distance between the sample populations and the reference.
Research has shown that mapping efficiency and downstream inferences are significantly affected by evolutionary distance from population to reference (for clarity, I am an author on that paper). If your samples are significantly different from the current reference, it may be worth generating a new one, but you'd have to measure the work to improvement tradeoff.

Sequencing strategy of these "old" samples.
To generate a new good quality transcriptome assembly, especially if you only have RNAseq reads, you'd need high quality long read sequencing.

ADD COMMENT
1
Entering edit mode
4 hours ago
GenoMax 154k

I have access to both the raw reads and the previously assembled transcriptome submitted to TSA.

and

I am trying to determine whether it is better to rely on the existing assembly or to generate a new one from the raw data, especially considering that current assembly pipelines have improved since the original work.

Depends on how old the original transcriptome is and what strategy was used to create the assembly. If the assembly was created using one of the standard programs (e.g. trinity) then you are not likely to get drastically better results without using new/additional data (unless the original work was not done to a certain standard).

I have no computational limitations

If that extends to include time you will spend on redoing the assembly then go ahead and try reassembly. You can then compare the results of old with new and see if you manage to get some improvement.

If the genome sequence has since become available then it may be an additional avenue to directly try and predict genes from it to create models that you can use to compare your existing transcriptome to.

ADD COMMENT

Login before adding your answer.

Traffic: 3817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6