sorting an alignment in fasta format after the tip order in a phylogenetic tree
1
0
Entering edit mode
6.1 years ago

Dear all, I am trying to sort the sequences in a fasta file after the tip order of a phylogenetic tree containing the same sequences but arranged in a different order. Do you know any quick way to do it in R or in Biopython?

So far, I only managed to do it if by first converting the fasta file into a .csv, but this is not very efficient...

Thanks a lot, Alejandro

alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

Hard to answer without the proper phylogenetic tree. I would say that you need to move in a list your fasta headers ordered from your phylogenetic tree (this is the hard part actually). Then, with Biopython you will be able to sort your fasta sequences according to that list.

ADD REPLY
0
Entering edit mode

Thanks for the answer. I will try to be more precise... I have this fasta file:

>seq1
agagaga

>seq2 cgcgcggc

>seq3 tctctctc

>seq4
acgacgcg

and I want to sort it after a text file containing this:

seq4

seq1

seq2

seq3

Do you know how to do it? Thanks a lot for your time!

ADD REPLY
0
Entering edit mode
6.1 years ago

Using fasta index.

If the ID list is not long, simply paste the IDs into cmd.

samtools faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

Or

seqkit faidx seqs.fasta $(paste -s -d " " ids.txt) > result.fasta

For large number of IDs:

cat ids.txt | parallel -k seqkit faidx seqs.fasta {} > result.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6