The thing is I have a multi FASTA file and I was hoping to extract the gene coding regions with Glimmer multi-extract. I have already used the glimmer3 script and got two files: a .predict and a .detail. Now, when I try to use multi-extract it just gives me an error. Multi-extract asks me for this:
USAGE: multi-extract [options] <sequence-file> <coords>
Read multi-fasta-format <sequence-file> and extract from it the
subsequences specified by <coords>. By default, <coords>
is the name of a file containing lines of the form
<id> <tag> <start> <stop> [<frame>] ...
<id> is the identifier for the subsequence
<tag> is the tag of the sequence in <sequence-file> from which
to extract the entry
Now, although the glimmer3 package itself doesn't tell you from where you're supposed to get your <coords> file I assume it is from the .predict file (though some biolinux website suggested that the long-orfs output would do. In any case long-orfs doesn't seem to work with multi fasta as it only extracts the orfs from the first contig in my file.). But then.... the .predict file doesn't have the right structure, for a start it doesn't even include an <id> column, it's something like this:
>contig-7 orf00002 1741 461 orf00003 3381 1747 >Wcontig-7000023 >Wcontig-11112 orf00001 426 2648 orf00002 2710 4581 orf00003 4569 5480 orf00004 6990 6133 orf00006 9180 7108 orf00007 10201 9209 orf00008 11663 10203 orf00009 12489 11680 orf00010 13153 12473 orf00011 14382 13225 orf00013 14715 15968 orf00014 19868 16410 >Wcontig-1674000002 orf00001 2995 637 orf00002 2497 1166 orf00003 2984 2529
Does anybody know if I'm doing something terribly wrong or do I have to apply some commands to the file in order for it to meet multi-extract rules?