Question: merge multiple fasta sequences in two files into a single file line by line
1
gravatar for girijakaushal
2.2 years ago by
girijakaushal10 wrote:

Hello,

I need to combine two fasta files having thousands of fasta sequences like:

File1:

>HWI-700823F:57:C97D4ANXX:8:1101:1295:2240 2:N:0:GTGAAACG
NAAGAGGGGAATCAGGAGGGACCGCAAATATGCAGTGCAGCCCCGTGCCGTGTATGCAAC
TGGGGTACACATGTCCCAGAACATAGCCGGGAAGTCAACG
>HWI-700823F:57:C97D4ANXX:8:1101:1587:2235 2:N:0:GTGAAACG
NTCTGCCGCTCTGCGTACAAGCTTGAGAGTTTTTTTGCAGACCTTCTTGCCGGCGAGAGG
CTTAGCTATGGGAGCCAAAGCCATCATCTTCTTCTTCTCT
>HWI-700823F:57:C97D4ANXX:8:1101:1974:2229 2:N:0:NTGAAANN
NCTAAGCATGCTTTGAACTTGATCTTCTCCTTCACGAATGGGAGCGATTGGGATGGTCCT
TACAGATTGCAGTTTCAAGTTCCCAAGGCTTGGCGAAACA

File2:

>HWI-700823F:57:C97D4ANXX:8:1101:1295:2240 1:N:0:GTGAAACG
GTCCCGTGATAATGGAAGTATTTGATTCTCTGCTCCGTCTTGTGCGTTGACTTCCCGGCT
ATGTTCTGGGACATGTGTACCCCAGTTGCATACACGGCAC
>HWI-700823F:57:C97D4ANXX:8:1101:1587:2235 1:N:0:GTGAAACG
CAGAAAGAGAAGAAGAAGATGATGGCTTTGGCTCCCATAGCTAAGCCTCTCGCCGGCAAG
AAGGTCTGCAAAAAAACTCTCAAGCTTGTACGCAGAGCGG
>HWI-700823F:57:C97D4ANXX:8:1101:1974:2229 1:N:0:NTGAAANN
CAACGATCGCCCCCTTCTGCAGACAAGTTACCAACCATGGCACAACTTGTGTCAACAATT
TGTGTGTCCGGAAAGATTGCTCTGTCACACGCGCCTTCT

I want to combine both files line by line and expected outcome is:

>HWI-700823F:57:C97D4ANXX:8:1101:1295:2240 2:N:0:GTGAAACG
NAAGAGGGGAATCAGGAGGGACCGCAAATATGCAGTGCAGCCCCGTGCCGTGTATGCAAC
TGGGGTACACATGTCCCAGAACATAGCCGGGAAGTCAACG
>HWI-700823F:57:C97D4ANXX:8:1101:1295:2240 1:N:0:GTGAAACG
GTCCCGTGATAATGGAAGTATTTGATTCTCTGCTCCGTCTTGTGCGTTGACTTCCCGGCT
ATGTTCTGGGACATGTGTACCCCAGTTGCATACACGGCAC
>HWI-700823F:57:C97D4ANXX:8:1101:1587:2235 2:N:0:GTGAAACG
NTCTGCCGCTCTGCGTACAAGCTTGAGAGTTTTTTTGCAGACCTTCTTGCCGGCGAGAGG
>HWI-700823F:57:C97D4ANXX:8:1101:1587:2235 1:N:0:GTGAAACG
CAGAAAGAGAAGAAGAAGATGATGGCTTTGGCTCCCATAGCTAAGCCTCTCGCCGGCAAG
AAGGTCTGCAAAAAAACTCTCAAGCTTGTACGCAGAGCGG

means, I want to combine both files like one sequence of 1st file then 1st sequence of 2nd file and so on.

I tried various commands but I am not able to parse this multiple fasta file, it takes 1st line as one sequence and not give desired output.

Please help me.

line by line sequence fasta • 1.2k views
ADD COMMENTlink modified 2.2 years ago by Brian Bushnell15k • written 2.2 years ago by girijakaushal10

Hi Pierre,

Should I firstly linerize my sequences and then use the paste and transform command? or this one line command is enough for getting desired output?

Thank you.

ADD REPLYlink written 2.2 years ago by girijakaushal10
1

Please use ADD REPLY/ADD COMMENT to respond to existing posts.

Save following code in a file called linearize.awk

/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}

and then run the command as shown by @Pierre.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax56k
2
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

linearize , paste and transform:

paste <(awk -f linearize.awk file1.fa ) <(awk -f linearize.awk file2.fa  )| tr "\t" "\n"
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum112k

opps , fixed paste at wrong position

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum112k
1
gravatar for Medhat
2.2 years ago by
Medhat7.7k
Poland
Medhat7.7k wrote:

Using python / I did not test the code

from itertools import izip
file_1 = "path/to/file_1.fasta"
file_2 = "path/to/file_2.fasta"
with open("result.fasta", "w") as output, open(file_1, "r") as f_1,  open(file_2, "r") as f_2:
       for line_from_file_1, line_from_file_2 in izip(f_1, f_2):
              output.write("{}{}".format(line_from_file_1, f_1.next(),  line_from_file_2, f_2.next()))
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Medhat7.7k
1
gravatar for Matt Shirley
2.2 years ago by
Matt Shirley8.6k
Cambridge, MA
Matt Shirley8.6k wrote:

ADD COMMENTlink written 2.2 years ago by Matt Shirley8.6k

Thanks for this answer, It is much faster.

ADD REPLYlink written 2.1 years ago by Medhat7.7k
1
gravatar for Brian Bushnell
2.2 years ago by
Walnut Creek, USA
Brian Bushnell15k wrote:

You've got file 2 and file 1 mixed up, they should be swapped. File 1 should have headers like

HWI-700823F:57:C97D4ANXX:8:1101:1295:2240 1:N:0:GTGAAACG

Anyway, you can use the BBMap package's reformat tool like this:

reformat.sh in1=file.1fa in2=file2.fa out=interleaved.fa
ADD COMMENTlink written 2.2 years ago by Brian Bushnell15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1771 users visited in the last hour