Question

Finding novel transcripts

0

Entering edit mode

15 months ago

analyst ▴ 70

Dear all,

I have run standalone blastx using default options for my list of transcripts and got output like this:

MSTRG.285.1 KAG7595701.1    68.976  332 9   3   194 1189    26  263 5.59e-112   342
MSTRG.285.1 KAG7595701.1    90.698  43  3   1   30  158 1   42  7.53e-13    81.3
MSTRG.285.1 CAD5311629.1    68.675  332 10  3   194 1189    26  263 5.14e-111   339
MSTRG.285.1 CAD5311629.1    90.698  43  3   1   30  158 1   42  6.85e-13    81.3
MSTRG.285.1 XP_020868191.1  67.683  328 54  3   206 1189    18  293 3.48e-107   331
MSTRG.285.1 KAG7591185.1    68.339  319 49  3   230 1186    39  305 3.10e-104   324
MSTRG.285.1 CAH8251728.1    67.812  320 51  3   230 1189    39  306 3.22e-103   321
MSTRG.285.1 KAG7653704.1    68.125  320 50  3   230 1189    39  306 3.33e-103   321
MSTRG.285.1 EFH68782.1  67.812  320 51  3   230 1189    38  305 3.44e-103   321
MSTRG.285.1 OAP14613.1  86.387  191 0   1   617 1189    92  256 5.16e-70    234

Please guide which filters should I apply to get novel transcripts?

Thanks

blastx transcripts • 1.5k views

ADD COMMENT • link updated 15 months ago by biofalconch ★ 1.3k • written 15 months ago by analyst ▴ 70

2

Entering edit mode

Define "novel". If your queries are showing good "hits" (like some above) to something in the database then they are not "novel" by definition of the word.

ADD REPLY • link 15 months ago by GenoMax 154k

0

Entering edit mode

Good hits with minimum e-value right?

Should I take care of query coverage also? If yes please guide how can I calculate query coverage from above output file.

This is the command that I used:

blastx -query file.fa -db nr -strand plus -out output_blastx.txt -evalue 1E-10 -outfmt 6

ADD REPLY • link 15 months ago by analyst ▴ 70

0

Entering edit mode

Is there any source from where I can get idea which threshold or criteria to set for percentage identity parameter to get novel transcripts for example transcripts less than 80 or 90 represents novelty?

ADD REPLY • link 15 months ago by analyst ▴ 70

2

Entering edit mode

I encourage you to have a look at OrthoFinder: https://github.com/davidemms/OrthoFinder

Not only will you now which transcripts are new, you will get a full classification for every transcript

ADD REPLY • link 15 months ago by biofalconch ★ 1.3k

0

Entering edit mode

Thank you biofalconch!

I am looking for novel lncRNA transcripts from Arabidopsis thaliana data.

ADD REPLY • link 15 months ago by analyst ▴ 70

1

Entering edit mode

Thats a little bit harder, since lncRNAs usually tend to evolve differently, maybe something like what's proposed in this paper would be of use?

https://www.nature.com/articles/s41598-022-18254-0

They go through various filters to make sure they don't code for proteins, really not much trying to find lncRNAs using other species....

ADD REPLY • link 15 months ago by biofalconch ★ 1.3k