I have around 1000 gbff (genbank) files and I want to extract from each:
- Accession number e.g.
- tax ID e.g.
- all the tRNA anticodons (zero or multiple per file) e.g.
/anticodon=pos:complement(1141190..1141192),aa:Met,seq:cat)(I actually just want the
(note that the latter sometimes spills across two lines).
and compile output into a table containing rows for each file.
Example of file: https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP011636
grep -e "LOCUS" -e "taxon:" -e "anticodon=" Klebsiella_oxytoca.gbff > output.txt
This sort of works but
grep doesn't work when #3 spills onto two lines and also I need to store all the output from multiple files. Thanks for any help!