Concatenate Two .Fasta Files Into One
2
0
Entering edit mode
10.9 years ago
Alice ▴ 320

Hello, biostars! I have two fasta files for two different genes and want to create one data matrix. Is there any function in R for that? F.ex. if I have 2 DNAbin objects for that genes. Id numbers are identical in both files. I have the first file:

>sp1
aacc
>sp2
ggtt

the second file:

>sp1
ggaa
>sp2
ttgg

I want:

>sp1
aaccggaa
>sp2
ggttttgg

Python is also OK, but i'm interested in R.

fasta r • 15k views
ADD COMMENT
0
Entering edit mode

Could you comment on the rationale behind what you're trying to do?

ADD REPLY
0
Entering edit mode

In few words: concatenated sequence matrix -> alignment -> phylogenetic tree

ADD REPLY
0
Entering edit mode

Is it some kind of homework question. I answered the same question 4-5 days back. See ehre: C: Combining dna sequences files into one

ADD REPLY
0
Entering edit mode

No, it's for my lab work. Your answer is also helpful, thanks.

ADD REPLY
4
Entering edit mode
10.9 years ago

Just cbind(A,B) to merge the sequences for DNAbin A and DNAbin B:

A.fa:

>sp1
aacc
>sp2
ggtt

B.fa:

>sp1
ggaa
>sp2
ttgg

In R using DNAbin (as you requested):

library(ape)
A <- read.dna("A.fa", format="fasta")
B <- read.dna("B.fa", format="fasta")
C <- cbind(A,B)
write.dna(C, "C.fa", format="fasta")

C.fa:

>sp1
aaccggaa
>sp2
ggttttgg

See help(DNAbin) for more details about options for cbind(), particularly fill.with.gaps and check.names.

ADD COMMENT
0
Entering edit mode

I've already tried that. Error: the 'cbind' method for "DNAbin" accepts only matrices

ADD REPLY
0
Entering edit mode

How did you read in the sequences?

ADD REPLY
0
Entering edit mode

read.dna("B.fa", format="fasta") - fail read.FASTA("B.fasta") - fail

ADD REPLY
0
Entering edit mode

If you get an error message of "fail" or something like that, then you have bigger issues.

ADD REPLY
0
Entering edit mode

by "fail" i mean the same error message in both cases: cbind' method for "DNAbin" accepts only matrices

ADD REPLY
0
Entering edit mode

It would be helpful if you posted a reproducible example. The original examples in your question will work fine.

ADD REPLY
0
Entering edit mode

I think problem is in lines, i.e. one sequence is like:

>sp1
aattgg
aaggtt

and not

>sp1
aattggaaggtt
ADD REPLY
0
Entering edit mode

Worked for me.

ADD REPLY
4
Entering edit mode
10.9 years ago
Haluk ▴ 190

You can do this with an awk

paste A.fa B.fa | awk '{if (NR%2==0) {print $1 $2} else {print $1}}'
ADD COMMENT
0
Entering edit mode

Thank you! It works. I have absolutely no experience with awk, so i have one question: the order of IDs in A.fa have to be the same, as in B.fa? Or concatenation goes by comparing IDs in two files?

ADD REPLY
0
Entering edit mode

They have to be the same and each sequence can occupy only 1 line.

ADD REPLY
0
Entering edit mode

Ok, thanks, it is really important.

ADD REPLY
0
Entering edit mode

paste -d '\0' File_A File_B | sed 's/>[A-Z]*//' > File_C.fa will also do the same.

ADD REPLY

Login before adding your answer.

Traffic: 1326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6