I have a .fasta file resulting from vsearch clustering. The sequences in the .fasta file look like:
the "seqs" parameter in the sequence header, reflects the number of reads of that cluster consensus that was present in the original input file.
I now want to remove sequences that have a value of "seqs" below a certain threshold. (for example 10) I want to use a conditional statement for this, but I cannot seem to find software that can be used for this. I checked things like SeqKit and Seqtk, but these only allow for regular expression filtering. I also find it hard to use bash/awk, as it is .fasta format.
I'd need something like (in pseudocode):
for sequence in fasta:
if seqs < value:
How could I filter based on a conditional statement? Thanks!