Running Blastall Before MCL for Paralog Clustering
1
2
Entering edit mode
3.9 years ago

Hello,

I am trying to use MCL to do paralog clustering of 13 genomes for comparative genomics. I have been using the protocol "Using MCL to Extract Clusters from Networks" as a reference for doing this. In the protocol, it says that I have to run blastall -p blastp with the -m8 parameter on my protein fasta files before I can run MCL. It states that there are instructions on how to do this in the supplementary material, but I have not been able to find them. I am confused on how to do this step as it is has not worked for me so far. How do I do this so I can move on to running MCL? Thank you in advance.

-Brittany

Comparative Genomics MCL Blast • 1.0k views
ADD COMMENT
3
Entering edit mode
3.9 years ago
Mensur Dlakic ★ 27k

I am confused on how to do this step as it is has not worked for me so far.

What exactly did you try? Has it not worked because of incorrect blast installation, or you don't know how to formulate the command? If the latter, try this:

blastall -p blastp -i seq.fas -d seq.fas -m8 -o seq.cblast -e 1e-5

This assumes that you have all your sequences in seq.fas and that you want to search them vs the same database. First you must format them with formatdb command or use a different database with -d. Once you have the output in seq.cblast, you can follow this protocol.

ADD COMMENT
0
Entering edit mode

Thank you for your response! It was not working because of how I formulated the command. Do I run each of the 13 protein fasta files separately? And also, what database am I supposed to be blasting them to in order to run MCL?

ADD REPLY
0
Entering edit mode

From what I understand, your goal is to cluster paralogs from 13 different genomes. That means concatenating all genomes together, doing all-vs-all blast search, and finally MCL clustering.

Concatenation (typing only 3 genome names):

cat genome_01.fas genome_02.fas genome_03.fas > all_genomes.fas

Formatting BLAST (not BLAST+, though it may work with it) database:

formatdb -i all_genomes.fas -p T -t "My 13 genomes"

BLASTing:

blastall -p blastp -i all_genomes.fas -d all_genomes.fas -m8 -o seq.cblast -e 1e-5

After that follow the protocol from my previous message. You may need to increase the E-value cutoff from 1e-5 - not sure what is appropriate for paralog detection.

ADD REPLY
0
Entering edit mode

Thank you so much! Your reply was very helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6