Question: sorting an alignment in fasta format after the tip order in a phylogenetic tree
0
gravatar for amartinez.ull
28 days ago by
amartinez.ull0 wrote:

Dear all, I am trying to sort the sequences in a fasta file after the tip order of a phylogenetic tree containing the same sequences but arranged in a different order. Do you know any quick way to do it in R or in Biopython?

So far, I only managed to do it if by first converting the fasta file into a .csv, but this is not very efficient...

Thanks a lot, Alejandro

alignment • 91 views
ADD COMMENTlink modified 28 days ago by shenwei3563.5k • written 28 days ago by amartinez.ull0

Hard to answer without the proper phylogenetic tree. I would say that you need to move in a list your fasta headers ordered from your phylogenetic tree (this is the hard part actually). Then, with Biopython you will be able to sort your fasta sequences according to that list.

ADD REPLYlink written 28 days ago by Bastien Hervé660

Thanks for the answer. I will try to be more precise... I have this fasta file:

>seq1
agagaga

>seq2 cgcgcggc

>seq3 tctctctc

>seq4
acgacgcg

and I want to sort it after a text file containing this:

seq4

seq1

seq2

seq3

Do you know how to do it? Thanks a lot for your time!

ADD REPLYlink modified 28 days ago by Pierre Lindenbaum106k • written 28 days ago by amartinez.ull0
0
gravatar for shenwei356
28 days ago by
shenwei3563.5k
China
shenwei3563.5k wrote:

Using fasta index.

If the ID list is not long, simply paste the IDs into cmd.

samtools faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

Or

seqkit faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

For large number of IDs:

cat ids.txt | parallel -k seqkit faidx seqs.fasta {} > result.fasta
ADD COMMENTlink written 28 days ago by shenwei3563.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 948 users visited in the last hour