Ignore Error "Multiple Sequences Found With Same Name" In Clustalw
1
1
Entering edit mode
11.3 years ago
david ▴ 10

Hi,

I have a python program generating a clustalw2 alignment of about 500 sequences from a fasta file. The names of the sequences correspond to the respective organisms plus the substrate specificity of a given sequence. Therefore quite a few of these names are identical and i get the error message: "Error: Multiple sequences found with same name" and no alignment is generated. Is it possible to ignore this error without having to change all the sequence names?

Cheers David

biopython clustalw • 5.4k views
ADD COMMENT
7
Entering edit mode
11.3 years ago

The names of the sequences must be unique to do alignment in ClustalW/X.

I would name your 500 sequences as numbers from 0 to 499 and store the original names in a dictionary or a list.

For example:

d = {1: 'Organism1Substrate', 2:'Organism1Substrate' , ..., 499:'Organism2Substrate'}

or:

l = ['Organism1Substrate', 'Organism1Substrate', 'Organism1Substrate', ..]

Once you performed the alignment, just replace the numbers with original names.

ADD COMMENT
1
Entering edit mode

+1 for this. In the past I have just GREPed the names and added numbers or more information to make them unique, but I like this idea better.

ADD REPLY
1
Entering edit mode

Agree. Many phylogenetic programs have problems handling fancy sequence names. The horrible case is phylip format (used by RAxML etc) which allows only 10 characters per name. So I always rename the sequences as "s1", "s2", s3"... I don't recommend using 1, 2, 3... because some programs cannot handle numerical sequence names.

ADD REPLY

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6