I'm trying to find specific protein domains in RNA-Seq contigs to identify previously unknown isotypes. The tool which seems to be most promising for this task is rpsblast.
So my questions are:
- Do you know of an already established pipeline for this task?
- If not, my idea is to cut the available sequence information by first applying a BLAST search with homologues and loose settings and then using CD-search on the remaining ones. Is that a good approach or are there alternatives?
- Are there any (automatic) pre- or postprocessing steps you would recommend?
- Are there any ideas on how to extend contigs with broken domains? BLASTing against the reads would be an option, but there maybe is a tool available.