Question: Maker Pipeline blast results
5.2 years ago by
United States
I am hoping to use maker on a small cluster (~6 compute nodes) to annotate a fairly fragmented de novo assembly that has some longer contigs. We have maker installed, but so far even though every program runs, RepeatMasker seems to be the only program finding matches. Namely, blastx and exonerate don't find any alignment matches even though they seem to be set up correctly in the maker control file.

What I was wondering was whether this is an artifact of the fragmented assembly or some sort of setup error? I find the former hard to believe considering I got at least 2-3 blast hits for each  longer contig in the entire assembly using galaxy megablast. I think the error lies in the fact that I get 0 hits, but I am not sure why: 


/usr/bin/blastx -db /tmp/maker_sHnU1b/chickenproteomeuniprot%2Efasta.mpi.10.9 -query /tmp/maker_sHnU1b/0/scaffold_1035.0 -num_alignments 10000 -num_descriptions 10000 -evalue 1e-06 -dbsize 300 -searchsp 500000000 -num_threads 1 -seg yes -soft_masking true -lcase_masking -show_gis -out /home/zgayk/MakerExample2/Gaviaimmerheader.maker.output/Gaviaimmerheader_datastore/38/7C/scaffold_1035//theVoid.scaffold_1035/0/scaffold_1035.0.chickenproteomeuniprot%2Efasta.blastx.temp_dir/chickenproteomeuniprot%2Efasta.mpi.10.9.blastx


deleted:0 hits

collecting blastx reports

flattening protein clusters

prepare section files

processing the chunk divide

preparing evidence clusters for annotations

Preparing evidence for hint based annotation

clustering transcripts into genes for annotations

Processing transcripts into genes

choosing best annotation set

Choosing best annotations

processing chunk output

processing contig output

examining contents of the fasta file and run log


Essentially each .gff file produced for each contig is empty. If anyone knew how to fix this, I would be very appreciative.

Zach Gayk

annotation • 1.7k views
Could you tell us

- What is your N50 ?

- What did you fill for the min_contig parameter in the maker_opts.ctl ?

- What kind of proteins (database ?) do you try to align on your genome ?

- Which kind of genome do you try to annotate ? Bird ? Fungi ?


As specified in the "maker_opts.ctl", under 10kb try to annotate a sequence is often useless.

5.2 years ago by
United States
zgayk90 wrote:

Hello, the assembly is fragmented:

Minimum     Number            Number            Total             Total             Scaffold
Scaffold    of                of                Scaffold          Contig            Contig  
Length      Scaffolds         Contigs           Length            Length            Coverage
--------    --------------    --------------    --------------    --------------    --------
    All          5,237,924         5,238,436       767,438,425       767,326,331      99.99%
     50          3,616,441         3,616,953       710,236,525       710,124,431      99.98%
    100          2,146,720         2,147,232       604,271,394       604,159,300      99.98%
    250            743,885           744,397       394,016,485       393,904,391      99.97%
    500            247,247           247,755       223,350,732       223,238,838      99.95%
   1 KB             62,044            62,409        98,533,822        98,431,583      99.90%
 2.5 KB              5,725             5,731        18,713,830        18,710,728      99.98%
   5 KB                231               231         1,310,589         1,310,589     100.00%

The assembly is from a bird: the common loon (Gavia immer). I used the chicken (Gallus gallus) proteome as protein data, along with chicken cDNA for EST evidence. I put the minimum contig length at 500. The contig N50 is 814 bp.


Most of the assembly is in small contigs less than 1 kb, and I was only going to use maker as a trial. I thought it might be possible to get valid annotations for the longer contigs at least, but if you think this is not feasible let me know. The assembly was produced using abyss with pe read data and a k-mer size of 32. Then, because it was still so fragmented, I aligned the contigs to the available red-throated loon genome and this is what is shown. I am not sure why the assembly remain this fragmented (we have basically have no scaffolds), although it could be that the group that did the sequencing used one pe library (8kb). If there are any suggestions as to why the assembly remains so fragmented, I would be very interested. Are we too limited by having one insert library?







gravatar for Juke34
5.2 years ago by
Juke344.5k wrote:

According to the size of your contigs, your Maker result it's not surprising. Moreover, the genes in that kind of genome are quite long.

I think you should focus your work on the assembly before to try to perform any annotation. You must improve significantly the size of your contigs ! I suggest you to try other assembly tools... but I'm not expert in this field.

good luck :)

Does anyone have any ideas as to why the assembly is so fragmented, and specifically that no scaffolds are being produced? If I'm to improve the assembly I'll need to identify whether the current result is due to low quality data for making long contigs (only one insert size) or perhaps an error in the assembly process. I realize it is hard to determine from a distance, but any help would be appreciated.



