Blast a CDD PSSM against a genome
1
0
Entering edit mode
10 weeks ago
rubic ▴ 210

Hi,

I'm trying to search an NCBI conserved domain against a large genome.

I dowloaded NCBI's CDD PSSM files and indexed the genome both as a nucl dbtype as well as a prot dbtype. Now I'm trying to run psi-blast from the command line with one of the PSSM files (CHL00001.smp) against my indexed genome and I'm getting these warnings:

FastaReader: Hyphens are invalid and will be ignored around line 16147
FASTA-Reader: Ignoring invalid residues at position(s): On line 16147: 1, 3-18, 20-22, 25-26, 28-29
FASTA-Reader: Ignoring invalid residues at position(s): On line 16148: 1, 3-4, 6-8, 10, 12-13

And this happens even if I use deltablast, blastp and tblastn.

I'm assuming the PSSM file is not of the format the blast is accepts (though it seems weird since this PSSM file is from NCBI).

Any idea?

pssm CDD blast • 246 views
ADD COMMENT
0
Entering edit mode
10 weeks ago
Mensur Dlakic ★ 11k

Difficulty to know exactly what you have tried without a specific command, but this might work for you.

First, I suggest you format your database as a nucleotide file that it appears to be:

makeblastdb -in your_db_file_name -dbtype nucl

Next, tblastn should be able to read those checkpoint/PSSM files as long as the first couple of lines of that file look like this:

PssmWithParameters ::= {
  pssm {
    isProtein TRUE,
    numRows 28,

The command:

tblastn -in_pssm CHL00001.smp -db your_db_file_name -evalue 1e-5 -out tblastn_results.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6