Question: Making fasta file for clustal
0
gravatar for burnsro
3.9 years ago by
burnsro20
Austria
burnsro20 wrote:

I have two fasta files of DNA sequences for upstream promoter regions for two species, and I would like to align them in clustalW

I read on the clustalW manual pages in ubuntu that "all sequences must be in 1 file, one after another"

I'm trying to understand if that means I need to have each orthologous pair, one after the other, merged into one fasta file for my two species. And does anyone know how this can be achieved?

 

 

clustal fasta • 1.8k views
ADD COMMENTlink modified 3.9 years ago by Charles Plessy2.6k • written 3.9 years ago by burnsro20
0
gravatar for Vivek
3.9 years ago by
Vivek2.2k
Denmark
Vivek2.2k wrote:

If you are planning to do a pairwise alignment, you need to have a single file for each query and target fasta sequence, kind of like this. You could read the existing files using your favorite fasta parsing modules in either Bio-Perl or Bio-Python and write each sequence into a new file.

>Query1

TGCCTACTGAGCTGAAACAGT

>Target1

CAGTAACCATGACCTCCCGCAGGACAGCGGAGCC

Here's a thread on splitting fasta files:

How To Split A Multiple Fasta

 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Vivek2.2k
0
gravatar for venu
3.9 years ago by
venu6.1k
Germany
venu6.1k wrote:

all sequences must be in 1 file, one after another This means just keep all the fasta sequences in one file, nothing more. When clustalW asks for the file name containing fasta sequences give the file name in which all the fasta sequences are present. You can set all other parameters like MSA or pairwise alignment parameters before the alignment begins.

ADD COMMENTlink written 3.9 years ago by venu6.1k
0
gravatar for Charles Plessy
3.9 years ago by
Charles Plessy2.6k
Japan
Charles Plessy2.6k wrote:

The full quote is:

SEQUENCE INPUT:  all sequences must be in 1 file, one after another. 
7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT,
Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file.
All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in MSF-RSF).

This gives you a list of possible sequence formats.  Most of them are well documented, in particular the FASTA format. You can also find more examples in the EMBOSS documentation.

This said, as suggested by the other answers, the FASTA format may be the easiest for you.

ADD COMMENTlink written 3.9 years ago by Charles Plessy2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1159 users visited in the last hour