Question: HOw to merge multifasta sequence into a single sequence having only one header?
0
gravatar for majeedaasim
2.2 years ago by
majeedaasim40
United States
majeedaasim40 wrote:

I have a multifasta sequence file. I want to merge all the sequences together to create a single sequence file. I men that the ">IDs" in the sequences be removed to create a super sequence. THis would take much time doing mannualy.

how can it be done in linux

THanks

merge fasta • 4.4k views
ADD COMMENTlink modified 2.2 years ago by Charles Plessy2.7k • written 2.2 years ago by majeedaasim40

If I may ask, for what need?

ADD REPLYlink written 2.2 years ago by Santosh Anand5.0k

@majeedaasim please choose the accept answer option if it works for you, It will help us motivated. Good Luck!

ADD REPLYlink written 2.2 years ago by mittu1602180
5
gravatar for Charles Plessy
2.2 years ago by
Charles Plessy2.7k
Japan
Charles Plessy2.7k wrote:

Using the union command from the EMBOSS package:

$ cat test.fasta 
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG

$ union -filter test.fasta
>seq1
AAAATTGGGGGCCCTTTTAAATGGGG
ADD COMMENTlink written 2.2 years ago by Charles Plessy2.7k
1
gravatar for mittu1602
2.2 years ago by
mittu1602180
India
mittu1602180 wrote:

grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print ">My_New_Sequence_name\n" } { print }' > new‚Äč.fasta

test.fasta
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG

new.fasta
>My_New_Sequence_name
AAAATTGGGGGCCCTTTTAAATGGGG
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by mittu1602180

Hi,

Would this work on a mac osx?

ADD REPLYlink written 7 days ago by cssulliv10

Not necessarily. MacOS ships with a non standard version of grep (I.e. not GNU coreutils). Consequently, the syntax often isn't 100% transferable. It may work, but that's not something you can rely on. You can however download and install the 'proper' coreutils via HomeBrew or MacPorts.

ADD REPLYlink written 7 days ago by Joe16k
0
gravatar for yhoogstrate
2.2 years ago by
yhoogstrate60
Netherlands
yhoogstrate60 wrote:

grep -v '^>' in.fa > out.fa

if in.fa =

>chr1
ttttccccaaaagggg
>chr2
ACTGACTGnnnnACTG
>chr3.1
ACTGACTGaaaac
>chr3.2
ACTGACTGaaaacc
>chr3.3
ACTGACTGaaaaccc
>chr4
ACTGnnnn
>chr5
nnACTG

then out.fa becomes:

ttttccccaaaagggg
ACTGACTGnnnnACTG
ACTGACTGaaaac
ACTGACTGaaaacc
ACTGACTGaaaaccc
ACTGnnnn
nnACTG
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by yhoogstrate60
0
gravatar for Joe
2.2 years ago by
Joe16k
United Kingdom
Joe16k wrote:
cat multifasta.fa | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g' > output.fa

E.g:

$ cat ~/test/seqs.fasta
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNN
>tpg|Pyricularia_pennisetigena|AB818016
NNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
NNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN


$ cat ~/test/seqs.fasta | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g'
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNNNNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAANNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN

(retains just the header of the first seq in the multifasta)

Bonus:

If you also want to hard line-wrap the fasta to 80 chars (or whatever), the command becomes;

cat $1 | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Joe16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2329 users visited in the last hour