I have a FASTA-file like this:
>seqA
AAAAAAAAAA
>seqB
AAAAAAAAAA
>seqC
TTTTTTTTTT
>seqD
CCCCCCCCCC
>seqE
CCCCCCCCCC
>seqF
AAAAAAAAAA
I'm recently learning SeqKit, and I've found that rename can append _N in the header based on the occurrence of the sequence, and also that rmdump can remove duplicates. Is it possible to have these two commands together? And if not, if I start appending _N, how do I make sure the highest number is kept when I remove duplicate sequences?
Maybe I'm not explaining myself well, and I'm all new to this, but basically, my end goal is preferably this:
>3
AAAAAAAAAA
>1
TTTTTTTTTT
>2
CCCCCCCCCC
And if it's not possible to complete change the header, can the file be sorted by occurrence? Like this:
>seqA_3
AAAAAAAAAA
>seqD_2
CCCCCCCCCC
>seqC_1
TTTTTTTTTT
And preferably it would be nice if the solution used SeqKit or another solution that is relatively low on memory, because my data set is very long.