Question: Maker annotation failure
0
gravatar for steven
3.9 years ago by
steven70
United States
steven70 wrote:

Hello, I am using the latest version of MPI maker: http://www.yandell-lab.org/software/maker.html

My maker jobs have been running since the 16th, and some time during the 19th, the jobs stopped making progress. They all stopped after a series of four RETRY commands, then four FAILED commands --> it never reaches DIED_SKIPPED_PERMANENT.

I ran maker on an assembled genome of ~200,000,000 base pairs, with ~30,000 est and protein sequences from the same order. 

errors: http://pastebin.com/GX72Cimf

assembly • 2.0k views
ADD COMMENTlink modified 3.9 years ago by Lesley Sitter460 • written 3.9 years ago by steven70
1

You need to provide more information. Like, what commands did you run, what is the error message you got? What sub program fails in maker etc. With the amount of information you've given, I don't think anyone can help you.

ADD REPLYlink written 3.9 years ago by arnstrm1.7k
1

200,000,000 scaffolds?! Is each read its own scaffold? This sounds a bit fishy to me.

ADD REPLYlink written 3.9 years ago by matt.sarrasin70

Sorry...I meant that to read "base pairs", nice catch

ADD REPLYlink written 3.9 years ago by steven70
1

From the errors, you might want to check: (1) if all the blast executebles are in path (maker_exe.ctl will auto configure while installing) (2) if est sequences are DNA and proteins are amino acids (3) all your input sequences have unique ids (preferably short), if not recode them to just numbers (4) enabled repeat masking (you'll never be able to complete predictions without masking all the repeats).

ADD REPLYlink written 3.9 years ago by arnstrm1.7k

What exactly is repeat masking and which options would I want to change?

I'm going to try maker with the newest version of BLAST tomorrow. Also I downloaded all of the ests/proteins directly from the respective NCBI databases using biopython, so I don't think there are any issues there, I confirmed they were all in fasta format with a quick script. I will also try recoding them to numbers, there may be duplicates (species sequences also being under order sequences)

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by steven70
1

Hi, so couple of questions to make it easier to help you.

In which format did you provide the EST and proteins (fasta,fastq,something else)? can you maybe give an example
(head -n 20 est_file > example_est.txt )
(head -n 20 protein_file > example_prot.txt )? I had the experience that maker is really picky about the headers it can handle. 

I see for example a line that says Title is very long: 1038 characters (max is 1000) so maybe your headers are really huge

 

What does your maker_opts.ctl file look like?  Maybe there is a wrong path somewhere or you forgot to turn on some setting.

 

Also to answer you question about repeatmasker, it finds low complexity repeats, transposons etc and masks them with N's before annotations. This way you get an annotated file with repeats and it reduces computational time during the annotation of the rest of the genome. 

Have you looked at the GMOD training? It explains all of that stuff in detail.
http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_GMOD_Online_Training_2014

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Lesley Sitter460

The ESTs and proteins were in fasta format. I used usearch to create centroid fasta files after downloading all available sequences off of ncbi with BioPython. Additionally I grep'd out all of the headers, and none immediately looked 1000 characters long but I did not confirm this programatically. Here is an example of each:

ests: http://pastebin.com/NAJ2fHY4

proteins: http://pastebin.com/9T6m0UGi

opts file: http://pastebin.com/803dEXRL

I replaced the paths for privacy, but they are all valid paths. For altEST, I separated two paths with a comma. I am currently running Maker with JUST the species nucleotide sequence file to see if it completes without errors (protein2genome and est2genome turned off). Thanks for the reply

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by steven70
2
gravatar for Lesley Sitter
3.9 years ago by
Lesley Sitter460
Netherlands
Lesley Sitter460 wrote:

Hi,

And your scaffolds also don't have very long headers? 
Maybe try;
grep '>'  fasta_file.fa  | wc -L 
this should give the length of your largest header.


So one problem i had, but i don't remember if it was with pre-process step or with Maker/blast itself was gi headers couldn't be processed properly. The "|" character, blank spaces and "*" gave errors. 
Maybe make a small subset of you genome assembly (for example only 1 chromosome/scaffold/contig) and test if using EST and Prot data that does not have these characters in the headers works for you

sed 's/[^=>]*|*|//' file_in.fa > file_out.fa                                                    # Remove the character |
sed '/^$/d' file_in.fa > file_out.fa                                                             # Remove blank lines
sed '/\*$/d' file_in.fa > file_out.fa                                                            # Remove the character *

 

One last possibly remark i can make is that it might be a problem is you having set two paths for alt_est files
Have you tried concatenating both fasta's into one and just adding one path? I never read anywhere that MAKER is able to handle multiple paths in i'ts variables, but that might just be something i missed because i never needed to do it.


Let me know if anything worked, and if not i cannot figure out anything wrong here sorry 

 

ADD COMMENTlink written 3.9 years ago by Lesley Sitter460

So, I believe I was getting all of the errors because of an older BLAST version - I updated BLAST and now there are no errors in the output file. My only concern is this warning, do you think this will cause any problems with tblastx?

/common/opt/bioinformatics/ncbi-blast-2.2.31+/bin/tblastx: /lib64/libz.so.1: no

version information available (required by /common/opt/bioinformatics/ncbi-blast

-2.2.31+/bin/tblastx)

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by steven70

It seems to be outdated zlib problem, you probably just have to update it and the problem will be gone if i'm reading it correctly

http://sourceforge.net/p/samtools/mailman/message/25005099/

(ignore the fact that the comments are about samtools, just read comments below it)

ADD REPLYlink written 3.9 years ago by Lesley Sitter460

thanks, well i got the "Maker has completed !" message so looks like everything worked out

ADD REPLYlink written 3.9 years ago by steven70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1450 users visited in the last hour