We have recently generated two de-novo transcriptomics assembly for two different but related species. These new transcripts seem quite good on the basis of quality measurements, completeness and alignment with the previously sequenced genome and annotation. We were able to pick up novel genes and previously unannotated transcripts. In order to pick the alternate spliced transcripts we are currently trying running PASA (Program to Assemble Spliced Alignment) annotation.
After PASA annotation, when I made a comparison between trinity transcripts, already existing annotation and PASA annotation in IGB. I find out a case where trinity transcripts fully supported by previously curated annotation as well as the RNASeq data, but no PASA annotation. PASA annotation only shows one fragment, rest of the parts not even present in the valid and failed .gff3 file generated by PASA [Figure]. This leads to few questions - for which I don’t have any answer. Here are my questions:
- Why there is this difference in the PASA annotation- when there is already manual curation and RNAseq depth evidences are present? And which one is correct?
- PASA annotation using blat and gmap for the alignment of transcripts to the genome. And we have also used blat to align the transcript to the genome. Then why there is different in two blat alignment?
Data used for Transcriptomics assembly = 100 bp Paired end reads, non-strand specific
Attached figure description:-
Dark Blue colour = New assembled transcript annotation [this is the annotation generated by aligning assembled transcripts with the genome using blat].
Orange = existing curated annotation.
Red = PASA annotation
Blue colours = shows valid alignment annotation for blat and gmap respectively.
Read Depth = Green for control and Dark red = Knock-out
I am very much looking forward for the reply. Any suggestions/view would be very helpful.
PS = I also posting the same post in the seq-answer.