How to remove redundant sequences from fasta file ?
0
0
Entering edit mode
5 weeks ago
ANAM • 0

I've fasta file containing nucleotide sequences. How can I remove the redundant sequences?I'm trying to access cd-hit but web server is not available. Is there any other tool available for removing redundancy? I really appreciate any help or suggestion!

redundant cd-hit fasta file sequences • 198 views
ADD COMMENT
0
Entering edit mode

seqkit rmdup can remove duplicated sequences in a fasta file.

printf '> A1\nATTG\n> A2\nTTTA\n> A3\nATTG' | seqkit rmdup -sP

> A1
ATTG
> A2
TTTA
[INFO] 1 duplicated records removed
ADD REPLY
0
Entering edit mode
$ printf '>A1\nATTG\n>A2\nTTTA\n>A3\nATTG\n' | awk '/^>/ NR > 1 {getline seq; print $0,seq}' | sort -uk2,2 | tr -s " " "\n"

works if sequence is in a single line.

ADD REPLY

Login before adding your answer.

Traffic: 1636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6