Question: Three questions about an RNA-seq and protein domains data analysis.
1
gravatar for utsafar
2.5 years ago by
utsafar20
utsafar20 wrote:

I am working on saffron. my goal is to find candidate resistance genes in saffron. since saffron genome is not sequenced, I used its RNA-seq. I de novo assembled RNA-seq data using trinity in galaxy. then, again in galaxy, using tblastn, with E value 0.00000000001 and Minimum query coverage per hsp 70%, I found contigs that were similar to 112 reference plant resistance proteins.

First question: What you think about my approach? What are your better ideas for finding this genes in saffron RNA-seq?

I extracted longest ORFs of hit contigs and checked compared domains in those ORFs with domains in reference resistance genes using pfam. some ORFs have more domains than their similar reference genes.

Second question: How can I search domains in 700 ORFs and 112 genes in one step and not one by one?

Third question: How can I be sure about my annotations when some ORFs have additional domains that similar reference proteins don't have those domains.

Thank you All

ADD COMMENTlink modified 2.5 years ago by cschu1811.6k • written 2.5 years ago by utsafar20
1
gravatar for cschu181
2.5 years ago by
cschu1811.6k
cschu1811.6k wrote:

1.) I think your approach is valid, but I would try the following to possibly improve outcomes. Use your extracted ORFs, translate them into proteins, use blastp against plant resistance proteins. In addition to that, scan your sequences with pfamscan.pl/hmmscan (HMMer software) using Pfam-A as database then check the output for NB-ARC, LRR, TIR/CC domains (might have to check the correct spelling in the output). You can also run NLR-Parser (Steuernagel et al, 2015 Bioinformatics (http://bioinformatics.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=25586514)).

2.) Merge your sequence files, ORFs and gene sequences or use Galaxy's multiple input files option (most tools should have that).

3.) If you find additional domains, especially at 3' of the (CC|TIR)_NB-ARC_LRR domain group, this could mean that there is a domain-fusion event present (also see Sarris et al, 2016 Genome Biology (https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-016-0228-7)). In addition, it is theoretically possible that there is some other domain present instead of the CC|TIR domain (I don't have a reference for this right now).

ADD COMMENTlink written 2.5 years ago by cschu1811.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour