Searching For Specific Protein Domains In Rna-Seq
1
6
Entering edit mode
13.8 years ago

Hello,

I'm trying to find specific protein domains in RNA-Seq contigs to identify previously unknown isotypes. The tool which seems to be most promising for this task is rpsblast.

So my questions are:

  1. Do you know of an already established pipeline for this task?
  2. If not, my idea is to cut the available sequence information by first applying a BLAST search with homologues and loose settings and then using CD-search on the remaining ones. Is that a good approach or are there alternatives?
  3. Are there any (automatic) pre- or postprocessing steps you would recommend?
  4. Are there any ideas on how to extend contigs with broken domains? BLASTing against the reads would be an option, but there maybe is a tool available.
rna annotation protein pipeline • 4.6k views
ADD COMMENT
0
Entering edit mode

Can you give details on how many contigs approx. you have, if you have a reference sequence and what kind of protein domains you are looking for? That might help to give a better answer.

ADD REPLY
2
Entering edit mode
13.8 years ago
Michael 54k

Protein domains, so this sounds like a job for a PFAM search or best an InterProScan.

I assume you have no reference sequence, RNA-seq reads should be assembled into contigs to reduce the query size. The programs need protein sequences, therefore you have to translate your sequence in all 6 reading frames.

You can use transeq from the EMBOSS suite for translation. Both tools are available as web-services and for local installation.

Some more or less vague ideas to help reduce compute load:

  • Align the reads to a reference of a closely related organism if you don't have a reference genome
  • Blast against EST databases
  • If you get a good hit by this method you can remove the contigs from further analysis
  • Apply RepeatMasker and dust (this has most likely been done already)
  • if you have longer contigs you can probably restrict the search to all ORFs instead of full 6-frame translation
  • restrict the InterPro search to those tools and models that are relevant
  • if your reads are strand specific you could maybe go with 3 sense reading frames instead of all 6
ADD COMMENT
0
Entering edit mode

Thank you for your answer. As mentioned, I already have the contigs. My plan is to use a conserved domain search (either against CDD, PFAM, SMART, or any of the 10 other databases). InterProScan seems to be a good alternative for a low number of potentially matching contigs, but is hardly applicable in my case.

ADD REPLY
0
Entering edit mode

Hi Michael, I don't understand completely why not. Maybe, you have a large number of contigs? If the web-server has some restrictions you can still install the tool locally. The download link is on the same site. Of course this will require 'in-house' compute resources of some sort. You can start with a fraction of your data to estimate how long it will take. Using a blast 'pre-filter' to sort out the low-hanging fruits is possibly good idea.

ADD REPLY
0
Entering edit mode

Good point, I thought it's a web-only application. I'll look into it.

ADD REPLY

Login before adding your answer.

Traffic: 2967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6