I got a multifasta file like this
>stuff1;[gene1];stuff,morestuff ATGGAGATAATAGATAGC >stuff1;[gene2];stuff,morestuff ATGGAGATAATAGATAGC >stuff2;[gene1];stuff,morestuff GTACTACATCGCTAGCACTACT >stuff2;[gene2];stuff,morestuff GTAGTCATCAGCTACGACTACT
So between each ID and sequence is a new line. I want to extract e.q. all IDs and their sequences with [gene1], basically search the ID for a term and then extract ID and seq into a new fasta file with the filename of the extracted term. It is important that the complete ID is extracted, but the "search term" is just short ( in this case, [gene1])
awk'/[gene1]/' RS='>' input.fasta > output.fasta grep "[gene1]" input.fasta > output.fasta
But this just gave me all lines after [gene1] in both cases.
When searching for [gene1], i need a new multifasta like this:
>stuff1;[gene1];stuff,morestuff ATGGAGATAATAGATAGC >stuff2;[gene1];stuff,morestuff GTACTACATCGCTAGCACTACT