Entering edit mode
8.7 years ago
tcf.hcdg
▴
70
Hello
I have a text file containing the sequence IDs. These Ids file contain some duplicate IDs. Few IDs are also present more then 2 times in this file. I want to find unique IDs in one file and repeated IDs in another file. Furthermore I am also interested to find the number, how many times the repeated IDs present in the file.
I found duplicated sequence using the following command
$ cat id.txt | grep '^>' | sort | uniq -d > dupid.txt
This gives me the duplicated sequences in "dupid.txt" file . But the question of those who are present more then 2 times and how many times they are present remains unsolved. secondly how to find unique sequences.
Please suggest how can it be handled.
Thanks in advance