Entering edit mode
7.7 years ago
tcf.hcdg ▴ 70
I have a fasta sequence file which have some duplicate sequences in it. I want to remove all the duplicates from the file and secondly I wanted to stored these duplicate sequence in another file.
Please guide how can it be possible
It's not clear from your post: are you wanting to find duplicate sequences or duplicate sequence identifiers? In other words, which of the two lines do you want to check for duplicates in the set below:
I want to find duplicate sequence identifiers.
OK, so you want to remove any duplicated sequence identifiers and their corresponding sequence information from the FASTA file. Then you want to output those duplicated identifiers to a separate file. Each sequence identifier would only be shown one time, regardless of how many times it's duplicated in the FASTA data. Is that correct?
yes absolutely right