Question: OrthoMcl - Orthologs.txt is empty. All output is present in inParalogs.txt
0
gravatar for bioinfoSeeker
3.4 years ago by
United Kingdom
bioinfoSeeker20 wrote:

Hi,

I am attempting OrthoMCL to compare 10 different strains of a certain bacterial species, and went through the pipeline after installing MySql, OrthoMcl and Mcl locally.

The problem is that all the entries are in the inParalogs.txt, while the orthologs.txt is empty.

The input for the pipeline was the aminoacid fasta files (got from RAST output from assembled contigs).

I process each file through the pipeline individually.

I called on orthomcladjustfasta, orthomclfilterfasta, makeblastdb, blastall, orthomclblastparser, orthomclloadblast, orthomclpairs etc. They all ran to completion without any glitches.

My adjusted fasta header looks like this

>sampleID|Protein_id

makeblastdb and blastall commands i used for each of my sample are

makeblastdb -in mySample.goodProteins.fasta -dbtype prot -out mySample_blastDB

blastall -p blastp -i mySample.goodProteins.fasta -d mySample_blastDB -o mySample_blast.csv -e 1 -m 8 -a 2 -v 1000 -b 1000

Then I call on orthomclblastparser to produce similarSequences.txt for each file.

I appreciate any help in understanding why my orthologs.txt is empty and the inparalogs.txt has about 21000 rows of data.

Kind regards,

Brindha.

 

ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 3.4 years ago by bioinfoSeeker20

Typically, you want to include an outgroup in the clustering, so consider how closely related the strains are. Also, I'm not sure mixing blast+ database and legacy blast tools is a good idea, though it may work.

ADD REPLYlink written 3.4 years ago by SES8.1k

Thanks for your quick response @SES. 

Regarding outgroup, should it be a related species or can be totally random bacterial species?

Also could you expand on what you mean by "I'm not sure mixing blast+ database and legacy blast tools is a good idea, though it may work".  I was following on of the online tutorials to use makeblastdb and blastall. I tried earlier with formatdb that another tutorial suggested, but it didn't work with blastall. So, I used makeblastdb. 

Look forward to hearing your suggestions.

ADD REPLYlink written 3.4 years ago by bioinfoSeeker20

The outgroup should be closely related, not random. By mixing the tools I mean, formatdb and blastall work together (both from legacy blast), and makeblastdb and blastp (blast+) work together. By mixing them you are using programs from different toolkits and they are not designed to work together (they may, but it is not advisable).

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by SES8.1k

Yes. I used another strain from the same phlyum, but different family as the outgroup. The orthologs file is still empty while all the info is in the inparalogs file. Not sure why this is the case. (These strains btw is from a published paper on their comparative genomics, and they have managed to identify orthologs using orthomcl and Synergy2. Not sure why I am not able to replicate it). 

I shall try blastp with makeblastdb, and see if it make a difference.

(Separately, I also ran into trouble with Amphora2 that Synergy2 uses, and I have posted another question in this forum regarding that.)

Thanks much.

ADD REPLYlink written 3.4 years ago by bioinfoSeeker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 845 users visited in the last hour