Entering edit mode
5.8 years ago
Gene-ticks
▴
10
Hi All, I need some help with programming this little problem (either in perl or python) which I am not very familiar with. Is there a way to append number to a duplicate sequence name(like Xtimes?) to the very first duplicate sequence and discard the other duplicates.
>name_1_1
AGGGTTT
>name_:2:_X
GTTTGAA
>name_:3:_Y
GTTTGAA
Result I want :
>name_1_1
AGGGTTT
>name_:2:_X_2times
GTTTGAA
mirDeep (Perl code) does exactly this. It is quite simple, just build a hash with sequences as key, and names as values. If a key already exist, instead of assigning the sequence name as value, append a counter.