I have a file with contents:
>FC20423AAXX_7_1_82_883 GTTAGAGGTTCGAAG >FC20423AAXX_7_1_198_886 GGCTCAGTGGTCTAGTGGTATGATTCTCGCTT >FC20423AAXX_7_1_115_888 GGGGGTGTAGGGTGGGGTTGG >FC20423AAXX_7_1_99_894 GTTCGTATCCCACTTCTGACACCA
My desired output is:
>dme0_count=3 GTTAGAGGTTCGAAG >dme1_count=8 GGCTCAGTGGTCTAGTGGTATGATTCTCGCTT >dme2_count=3 GGGGGTGTAGGGTGGGGTTGG >dme3_count=6 GTTCGTATCCCACTTCTGACACCA
I am looking for a one liner or script to:
- replace the header as shown such that dmex_count=y (x=unique number, y=number of times the read appeared in the file).
- Delete non-unique reads as well as reads with length <15 and >30 to be deleted.
- Delete reads with count <=2
Before now, I've had to undertake one step at a time. I'd love to avoid that and use a one liner awk, sed or perl...