Have any of you used Corset (https://github.com/Oshlack/Corset/wiki) and Lace programs for contigs to create Super Transcripts? (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1284-1) I found this approach very interesting and wanted to try it on my RNAseq data. From almost 150000 contigs corset and lace programs created 3158 Super Transcripts.
I wanted to run blastx against bacteria database, and I am thinking that it might be wrong... Super Transcripts are long sequences which include few transcripts --> genes. The blast result for such a long sequence will take forever (?). The longest sequence is 284565 bp.
So, my question is, what do you think about running this steps:
- search for ORFs from Super Transcripts (eg. hmmer2go getorf)
- then translate it to nucleotide sequences,
- then align trough blastx and nr bacteria database,
In that way, I will have a nucleotide transcripts (each one representing one gene) and also its aa sequences, ready to go with annotation.
I am curious what are your experiences with Super Transcript approach?
Thanks in advance,