Plant transcriptome analysis and annotation
0
0
Entering edit mode
7 weeks ago
martta95 ▴ 10

When mapping RNA-Seq results to the reference genome using Stringtie, I find many new transcripts that match the reference genome perfectly but have a longer length at the 5' end. How should such transcripts be annotated correctly? Should I use blast to find the most closely related species in this case? Are there any alternatives to Stringtie that simultaneously map to the reference genome and assemble new transcripts with less restriction than a 1 bp mismatch?

plants sRNA-Seq adnnotation analysis expression • 697 views
ADD COMMENT
0
Entering edit mode

I find many new transcripts that match the reference genome perfectly but have a longer length at the 5' end

As in the sequence does not match the reference at all?

ADD REPLY
0
Entering edit mode

The sequence matches the reference transcripts 100%, but the transcripts identified in the study are longer than the reference ones. Moreover I find a lot of transcript matched to reference.

ADD REPLY
0
Entering edit mode

I find many new transcripts that match the reference genome

and

The sequence matches the reference transcripts 100%

So are you working with reference genome or reference transcriptome sequences.

If you are working with transcriptome(and if the reference you are working with is close to your species) then it is possible that strigntie is incorporating additional sequence at 5'-end that may or may not be real. Follow the other two comments and suggestions there in. Ultimately an experiment may need to be done to prove those extensions are real.

ADD REPLY
0
Entering edit mode

I am not an assembly guy at all, so I always take the easy route and ask "Do you really need to know about new transcripts"? Does your analysis care, realizing that new transcripts are uncharacterised, unvalidated and not annotated functionally". Or do you simply need expression counts for downstream analysis? If the latter, then use STAR or salmon (or alternatives) and map data against genome/transcriptome annitations, get your count matrix and call it a day.

ADD REPLY
0
Entering edit mode

Which reference genome are you working on ? If for a non-model organism, you can talk this over with the curator (if there is one). If you are the curator or there is noone you can deal with it however you want, but it is tricky. There are many contrasting annotation approaches.

A few options

  • convert your new positions to sequence, and remap to reference genome with eg PASA
  • use rnaspades to do denovo assemblies of your RNA-seq, get transcripts and map them back to the ref with eg PASA and check the signal.
ADD REPLY
0
Entering edit mode

I worked with Hordeum vulgare and use reference genome morex V3. In the first stage, I used stringtie for mapping. As a result, I obtained sequences mapped to the reference genome and a group of sequences recognized as new, most of which had changes in the last exon and transcripts that were longer than the reference ones. I am interested in determining the function and potential proteins that will be produced, as the identified differences may be related to varietal variability.

ADD REPLY

Login before adding your answer.

Traffic: 4723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6