Entering edit mode
                    15 months ago
        analyst
        
    
        ▴
    
    70
    Dear all,
I have run standalone blastx using default options for my list of transcripts and got output like this:
MSTRG.285.1 KAG7595701.1    68.976  332 9   3   194 1189    26  263 5.59e-112   342
MSTRG.285.1 KAG7595701.1    90.698  43  3   1   30  158 1   42  7.53e-13    81.3
MSTRG.285.1 CAD5311629.1    68.675  332 10  3   194 1189    26  263 5.14e-111   339
MSTRG.285.1 CAD5311629.1    90.698  43  3   1   30  158 1   42  6.85e-13    81.3
MSTRG.285.1 XP_020868191.1  67.683  328 54  3   206 1189    18  293 3.48e-107   331
MSTRG.285.1 KAG7591185.1    68.339  319 49  3   230 1186    39  305 3.10e-104   324
MSTRG.285.1 CAH8251728.1    67.812  320 51  3   230 1189    39  306 3.22e-103   321
MSTRG.285.1 KAG7653704.1    68.125  320 50  3   230 1189    39  306 3.33e-103   321
MSTRG.285.1 EFH68782.1  67.812  320 51  3   230 1189    38  305 3.44e-103   321
MSTRG.285.1 OAP14613.1  86.387  191 0   1   617 1189    92  256 5.16e-70    234
Please guide which filters should I apply to get novel transcripts?
Thanks
Define "novel". If your queries are showing good "hits" (like some above) to something in the database then they are not "novel" by definition of the word.
Good hits with minimum e-value right?
Should I take care of query coverage also? If yes please guide how can I calculate query coverage from above output file.
This is the command that I used:
Is there any source from where I can get idea which threshold or criteria to set for percentage identity parameter to get novel transcripts for example transcripts less than 80 or 90 represents novelty?
I encourage you to have a look at OrthoFinder: https://github.com/davidemms/OrthoFinder
Not only will you now which transcripts are new, you will get a full classification for every transcript
Thank you biofalconch!
I am looking for novel lncRNA transcripts from Arabidopsis thaliana data.
Thats a little bit harder, since lncRNAs usually tend to evolve differently, maybe something like what's proposed in this paper would be of use?
https://www.nature.com/articles/s41598-022-18254-0
They go through various filters to make sure they don't code for proteins, really not much trying to find lncRNAs using other species....