Question: Running Blastall Before MCL for Paralog Clustering
0
gravatar for brittanymlebert
6 weeks ago by
brittanymlebert10 wrote:

Hello,

I am trying to use MCL to do paralog clustering of 13 genomes for comparative genomics. I have been using the protocol "Using MCL to Extract Clusters from Networks" as a reference for doing this. In the protocol, it says that I have to run blastall -p blastp with the -m8 parameter on my protein fasta files before I can run MCL. It states that there are instructions on how to do this in the supplementary material, but I have not been able to find them. I am confused on how to do this step as it is has not worked for me so far. How do I do this so I can move on to running MCL? Thank you in advance.

-Brittany

ADD COMMENTlink modified 6 weeks ago by Mensur Dlakic5.8k • written 6 weeks ago by brittanymlebert10
1
gravatar for Mensur Dlakic
6 weeks ago by
Mensur Dlakic5.8k
USA
Mensur Dlakic5.8k wrote:

I am confused on how to do this step as it is has not worked for me so far.

What exactly did you try? Has it not worked because of incorrect blast installation, or you don't know how to formulate the command? If the latter, try this:

blastall -p blastp -i seq.fas -d seq.fas -m8 -o seq.cblast -e 1e-5

This assumes that you have all your sequences in seq.fas and that you want to search them vs the same database. First you must format them with formatdb command or use a different database with -d. Once you have the output in seq.cblast, you can follow this protocol.

ADD COMMENTlink written 6 weeks ago by Mensur Dlakic5.8k

Thank you for your response! It was not working because of how I formulated the command. Do I run each of the 13 protein fasta files separately? And also, what database am I supposed to be blasting them to in order to run MCL?

ADD REPLYlink written 6 weeks ago by brittanymlebert10

From what I understand, your goal is to cluster paralogs from 13 different genomes. That means concatenating all genomes together, doing all-vs-all blast search, and finally MCL clustering.

Concatenation (typing only 3 genome names):

cat genome_01.fas genome_02.fas genome_03.fas > all_genomes.fas

Formatting BLAST (not BLAST+, though it may work with it) database:

formatdb -i all_genomes.fas -p T -t "My 13 genomes"

BLASTing:

blastall -p blastp -i all_genomes.fas -d all_genomes.fas -m8 -o seq.cblast -e 1e-5

After that follow the protocol from my previous message. You may need to increase the E-value cutoff from 1e-5 - not sure what is appropriate for paralog detection.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Mensur Dlakic5.8k

Thank you so much! Your reply was very helpful!

ADD REPLYlink written 5 weeks ago by brittanymlebert10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 963 users visited in the last hour