Searching for conserved domains in a chromosome/scaffold
1
0
Entering edit mode
14 months ago
rubic ▴ 240

Hi,

I'd like to find all possible hits for a certain NCBI conserved domain (which is expressed as a PSSM) in a chromosome or scaffold of a certain genome. I know that the common way to search for conserved domain hits in a nucleotide sequence is to translate it into 6 ORFs and RPS-BLAST each one of them against the Conserved Domains Database (CDD). But this is reasonable for short nucleotide query sequences and not a chromosome length.

I'm thinking it might have already been done by NCBI, in which case my question would be if anyone knows which files have that info. But if I have to do that, I'm guessing some variation of a TBLASTN (where the query is the PSSM) will do. In that case any suggestion how to achieve that?

Thanks

blast domain CDD PSSM rpsblast • 665 views
1
Entering edit mode
14 months ago
Mensur Dlakic ★ 19k

BLAST+ suite has rpstblastn which is a combination of rps-blast and tblastn. That means it uses a DNA query, translates it in all reading frames, and compares predicted ORFs to a profile.

If you go to the CD-search web site, either protein or DNA sequence is acceptable as input.

0
Entering edit mode

Seems that rpstblastn handles queries up to 200KB length, so I think that I'll need to PSI-BLAST the chromosomes/scaffolds with the CDD profiles

0
Entering edit mode

Not sure where you came up with this 200 Kb limit - maybe when submitting to the CD-search web site?

Standalone rpstblastn has no such limit. I just searched using a 2.8 Mb bacterial genome and it went fine.