4.9 years ago by
And your scaffolds also don't have very long headers?
grep '>' fasta_file.fa | wc -L
this should give the length of your largest header.
So one problem I had, but I don't remember if it was with pre-process step or with Maker/blast itself was gi headers couldn't be processed properly. The
| character, blank spaces and
* gave errors.
Maybe make a small subset of you genome assembly (for example only 1 chromosome/scaffold/contig) and test if using EST and Prot data that does not have these characters in the headers works for you
sed 's/[^=>]*|*|//' file_in.fa > file_out.fa # Remove the character |
sed '/^$/d' file_in.fa > file_out.fa # Remove blank lines
sed '/\*$/d' file_in.fa > file_out.fa # Remove the character *
One last possibly remark I can make is that it might be a problem is you having set two paths for alt_est files
Have you tried concatenating both fasta's into one and just adding one path? I never read anywhere that MAKER is able to handle multiple paths in its variables, but that might just be something i missed because i never needed to do it.
Let me know if anything worked, and if not I cannot figure out anything wrong here sorry