Question: unique sequence IDs from fasta file
0
gravatar for tcf.hcdg
2.3 years ago by
tcf.hcdg40
European Union
tcf.hcdg40 wrote:

Dears

I have a fasta sequence file which have some duplicate sequences in it. I want to remove all the duplicates from the file and secondly I wanted to stored these duplicate sequence in another file. 

Please guide how can it be possible

 

Thanks

 

ADD COMMENTlink modified 2.3 years ago by kloetzl700 • written 2.3 years ago by tcf.hcdg40

It's not clear from your post: are you wanting to find duplicate sequences or duplicate sequence identifiers? In other words, which of the two lines do you want to check for duplicates in the set below?:

>GeneHeader
AAGTCAGCTGATGCTACGAC
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Dan D6.2k

I want to find duplicate sequence identifiers.

ADD REPLYlink written 2.3 years ago by tcf.hcdg40

OK, so you want to remove any duplicated sequence identifiers and their corresponding sequence information from the FASTA file. Then you want to output those duplicated identifiers to a separate file. Each sequence identifier would only be shown one time, regardless of how many times it's duplicated in the FASTA data. Is that correct?

 

ADD REPLYlink written 2.3 years ago by Dan D6.2k

yes absolutely right

ADD REPLYlink written 2.3 years ago by tcf.hcdg40
3
gravatar for kloetzl
2.3 years ago by
kloetzl700
European Union
kloetzl700 wrote:
$ cat *.fa* | grep '^>' | sort | uniq -d

This will print all duplicate entries. You can then use this list to extract the duplicate sequences from the file with one of the thousand fasta-manipulation-tools available.

 

ADD COMMENTlink written 2.3 years ago by kloetzl700

After getting uniq identifiers, Extract Sequence From Fasta File Using Ids From A Separate File is what to do.

ADD REPLYlink written 2.3 years ago by venu4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 585 users visited in the last hour