Question: HOw to merge multifasta sequence into a single sequence having only one header?
0
gravatar for majeedaasim
8 months ago by
majeedaasim20
United States
majeedaasim20 wrote:

I have a multifasta sequence file. I want to merge all the sequences together to create a single sequence file. I men that the ">IDs" in the sequences be removed to create a super sequence. THis would take much time doing mannualy.

how can it be done in linux

THanks

merge fasta • 944 views
ADD COMMENTlink modified 8 months ago by Charles Plessy2.5k • written 8 months ago by majeedaasim20

If I may ask, for what need?

ADD REPLYlink written 8 months ago by Santosh Anand3.9k

@majeedaasim please choose the accept answer option if it works for you, It will help us motivated. Good Luck!

ADD REPLYlink written 8 months ago by mittu1602150
1
gravatar for mittu1602
8 months ago by
mittu1602150
India
mittu1602150 wrote:

grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print ">My_New_Sequence_name\n" } { print }' > new​.fasta

test.fasta
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG

new.fasta
>My_New_Sequence_name
AAAATTGGGGGCCCTTTTAAATGGGG
ADD COMMENTlink modified 8 months ago • written 8 months ago by mittu1602150
0
gravatar for yhoogstrate
8 months ago by
yhoogstrate50
Netherlands
yhoogstrate50 wrote:

grep -v '^>' in.fa > out.fa

if in.fa =

>chr1
ttttccccaaaagggg
>chr2
ACTGACTGnnnnACTG
>chr3.1
ACTGACTGaaaac
>chr3.2
ACTGACTGaaaacc
>chr3.3
ACTGACTGaaaaccc
>chr4
ACTGnnnn
>chr5
nnACTG

then out.fa becomes:

ttttccccaaaagggg
ACTGACTGnnnnACTG
ACTGACTGaaaac
ACTGACTGaaaacc
ACTGACTGaaaaccc
ACTGnnnn
nnACTG
ADD COMMENTlink modified 8 months ago • written 8 months ago by yhoogstrate50
0
gravatar for jrj.healey
8 months ago by
jrj.healey6.8k
United Kingdom
jrj.healey6.8k wrote:
cat multifasta.fa | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g' > output.fa

E.g:

$ cat ~/test/seqs.fasta
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNN
>tpg|Pyricularia_pennisetigena|AB818016
NNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
NNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN


$ cat ~/test/seqs.fasta | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g'
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNNNNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAANNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN

(retains just the header of the first seq in the multifasta)

Bonus:

If you also want to hard line-wrap the fasta to 80 chars (or whatever), the command becomes;

cat $1 | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
ADD COMMENTlink modified 8 months ago • written 8 months ago by jrj.healey6.8k
0
gravatar for Charles Plessy
8 months ago by
Charles Plessy2.5k
Japan
Charles Plessy2.5k wrote:

Using the union command from the EMBOSS package:

$ cat test.fasta 
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG

$ union -filter test.fasta
>seq1
AAAATTGGGGGCCCTTTTAAATGGGG
ADD COMMENTlink written 8 months ago by Charles Plessy2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour