Augustus Parsing Loci names and gene names from a Gene bank file format
1
0
Entering edit mode
3.2 years ago
mwerseb1 • 0

Hi All,

I am following along the Augustus tutorial in Current Protocols in Bioinformatics (https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpbi.57). I am working through Protocol 6- Removing Redundant Gene Structures, and have run into an issue on Step 4- creating a loci.lst file. When I run the provided perl code snippet (see below) I get the error message below. I am not familiar with perl and would greatly appreciate if anyone can help figure out how to get this section to work.

Code Snippet:

cat bonafide.gb | perl -ne ’
if ( $_ =~ m/LOCUS\s+(\S+)\s/ ) {
$txLocus = $1;
} elsif ( $_ =~ m/\/gene=\"(\S+)\"/ ) {
$txInGb3{$1} = $txLocus
}
if( eof() ) {
foreach ( keys %txInGb3 ) {
print "$_\t$txInGb3{$_}\n";
}
}’ > loci.lst

where the bonafide.gb file looks like this: I need to pull out the locus name "h2tg000001l_432666-437116" and the gene name "h2tg000001l_t_gene1_mRNA1"

LOCUS       h2tg000001l_432666-437116   4451 bp  DNA
FEATURES             Location/Qualifiers
     source          1..4451
     mRNA            join(1286..1450,1766..1909,2591..3166)
                     /gene="h2tg000001l_t_gene1_mRNA1"
     CDS             join(1286..1450,1766..1909,2591..3166)
                     /gene="h2tg000001l_t_gene1_mRNA1"
BASE COUNT     599 a   460 c  353 g   518 t   2521 n
ORIGIN
        1 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
       61 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
      121 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn

Error message:

Unrecognized character \xE2; marked by <-- HERE after <-- HERE near column 1 at -e line 1.
./make_loci_list.sh: line 4: syntax error near unexpected token `('
./make_loci_list.sh: line 4: `if ( $_ =~ m/LOCUS\s+(\S+)\s/ ) {'

Any help in either debugging or any way to get a similar result with bash would be incredibly helpful.

Many Thanks!

perl Augustus • 692 views
ADD COMMENT
1
Entering edit mode
3.2 years ago
h.mon 35k

Did you copy / paste the perl code directly from the pdf? The problem is you have one (two, actually) invalid characters in the code snippet you copied, the enclosing the perl code - it should be '.

ADD COMMENT

Login before adding your answer.

Traffic: 2483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6