Hi,
Apologies if I do not follow the correct question formatting, this is my first time posting. My question is regarding the use of python regular expressions. I have a fasta file of sequences following the format:
>NODE_143195_length_100_cov_16076.000000
TTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTCAG
TCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCGGC
>NODE_143196_length_100_cov_15891.000000
CTTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTCA
GTCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCGG
>NODE_143197_length_100_cov_15696.000000
GCTTGTGTTGGTTGTTGTGTTGCCTGTCTTGGTGGCGGTTGTGTTGGCTGCTTTCGTGTC
AGTCTCTTCACCGATGTTATGTTGCTCTGTTGTGGCTCCG
I am trying to filter by both length and coverage. I want to filter sequences less than 5000bp and less than 100 coverage. I have been trying different variations of the following line:
^.+cov=([5-9][0-9][0-9]|([1-9]\d{4}\d*)\..+$
But I cannot seem to make it work. If anyone can help me, would be greatly appreciated. Thanks