3.9 years ago by
And your scaffolds also don't have very long headers?
grep '>' fasta_file.fa | wc -L
this should give the length of your largest header.
So one problem i had, but i don't remember if it was with pre-process step or with Maker/blast itself was gi headers couldn't be processed properly. The "|" character, blank spaces and "*" gave errors.
Maybe make a small subset of you genome assembly (for example only 1 chromosome/scaffold/contig) and test if using EST and Prot data that does not have these characters in the headers works for you
sed 's/[^=>]*|*|//' file_in.fa > file_out.fa # Remove the character |
sed '/^$/d' file_in.fa > file_out.fa # Remove blank lines
sed '/\*$/d' file_in.fa > file_out.fa # Remove the character *
One last possibly remark i can make is that it might be a problem is you having set two paths for alt_est files
Have you tried concatenating both fasta's into one and just adding one path? I never read anywhere that MAKER is able to handle multiple paths in i'ts variables, but that might just be something i missed because i never needed to do it.
Let me know if anything worked, and if not i cannot figure out anything wrong here sorry