Question: Transcript assembly with nanopore long reads
I'm looking for transcripts in an poorly annotated retrovirus, using oxfort nanopore reads. Currently, I'm trying to first "correct" the trimmend read sequences with the software canu and plan to do the assembly using the PASA pipeline (which uses the trinity module).

I'm completly new to long read processing and wanted to ask, if there are maybe better workflows for that purpose, or if there is an alternative. I also tried stringtie, but I only got transcripts which were one exon long.

Hannover Medical School
Not an expert here on ONT transcriptomics, but I would suggest in addition to using canu.

  • correction with short reads if possible. The tool FMLRC had a good writeup recently
  • read correction using long reads with canu. The phase will be lost, but that's likely not an issue
  • read mapping of both corrected and uncorrected reads to the virus.

How big is your virus by the way ?

I'm working with HCMV which is around 230 kb long.

Thanks, I will try to convinve my colleague, to add some short read data. I am more familier with those and will then give FMLRC a try.

Long read data will be more informative on transcript structure. See how far you get with your long read data, I've got a lot out of canu 's corrected reads.

However, SNVs and homopolymers will be an issue.

Also, maybe you can try using public data for short read correction.

Ok thanks, I din't even think about publicly availible short read data. But since the particular strain is also annotated in ensembl there might be some data out there

