Question: sorting an alignment in fasta format after the tip order in a phylogenetic tree
0
gravatar for amartinez.ull
3 months ago by
amartinez.ull0 wrote:

Dear all, I am trying to sort the sequences in a fasta file after the tip order of a phylogenetic tree containing the same sequences but arranged in a different order. Do you know any quick way to do it in R or in Biopython?

So far, I only managed to do it if by first converting the fasta file into a .csv, but this is not very efficient...

Thanks a lot, Alejandro

alignment • 152 views
ADD COMMENTlink modified 3 months ago by shenwei3564.0k • written 3 months ago by amartinez.ull0

Hard to answer without the proper phylogenetic tree. I would say that you need to move in a list your fasta headers ordered from your phylogenetic tree (this is the hard part actually). Then, with Biopython you will be able to sort your fasta sequences according to that list.

ADD REPLYlink written 3 months ago by Bastien Hervé1.5k

Thanks for the answer. I will try to be more precise... I have this fasta file:

>seq1
agagaga

>seq2 cgcgcggc

>seq3 tctctctc

>seq4
acgacgcg

and I want to sort it after a text file containing this:

seq4

seq1

seq2

seq3

Do you know how to do it? Thanks a lot for your time!

ADD REPLYlink modified 3 months ago by Pierre Lindenbaum110k • written 3 months ago by amartinez.ull0
0
gravatar for shenwei356
3 months ago by
shenwei3564.0k
China
shenwei3564.0k wrote:

Using fasta index.

If the ID list is not long, simply paste the IDs into cmd.

samtools faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

Or

seqkit faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

For large number of IDs:

cat ids.txt | parallel -k seqkit faidx seqs.fasta {} > result.fasta
ADD COMMENTlink written 3 months ago by shenwei3564.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 521 users visited in the last hour