HOw to merge multifasta sequence into a single sequence having only one header?
        4 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        I have a multifasta sequence file. I want to merge all the sequences together to create a single sequence file.
I men that the ">IDs" in the sequences be removed to create a super sequence. THis would take much time doing mannualy.
how can it be done  in linux
THanks
                    
                 
                 
                
                
                    
                    
    
        
        
            fasta
         
        
    
        
        
            merge
         
        
    
    
        • 14k views
    
 
                
                 
            
            
         
     
 
     
    
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Using the union  command from the EMBOSS  package:
$ cat test.fasta 
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG
$ union -filter test.fasta
>seq1
AAAATTGGGGGCCCTTTTAAATGGGG
 
                 
                
                
                 
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        grep -v "^>" test.fasta | awk 'BEGIN { ORS=""; print ">My_New_Sequence_name\n" } { print }' > new.fasta
test.fasta
>seq1
AAAATTGGG
>seq2
GGCCCTTTT
>seq3
AAATGGGG
new.fasta
>My_New_Sequence_name
AAAATTGGGGGCCCTTTTAAATGGGG
 
                 
                
                
                 
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        cat multifasta.fa | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g' > output.fa
E.g: 
$ cat ~/test/seqs.fasta
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNN
>tpg|Pyricularia_pennisetigena|AB818016
NNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
NNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
$ cat ~/test/seqs.fasta | sed -e '1!{/^>.*/d;}' | sed  ':a;N;$!ba;s/\n//2g'
>tpg|Magnaporthiopsis_incrustans|JF414846
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGCNNNNNNNNNNNNGCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAANNNAACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGCNNN
(retains just the header of the first seq in the multifasta)
Bonus:
If you also want to hard line-wrap the fasta to 80 chars (or whatever), the command becomes;
cat $1 | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g'
 
                 
                
                
                 
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        grep -v '^>' in.fa > out.fa
if in.fa =
>chr1
ttttccccaaaagggg
>chr2
ACTGACTGnnnnACTG
>chr3.1
ACTGACTGaaaac
>chr3.2
ACTGACTGaaaacc
>chr3.3
ACTGACTGaaaaccc
>chr4
ACTGnnnn
>chr5
nnACTG
then out.fa becomes:
ttttccccaaaagggg
ACTGACTGnnnnACTG
ACTGACTGaaaac
ACTGACTGaaaacc
ACTGACTGaaaaccc
ACTGnnnn
nnACTG
 
                 
                
                
                 
            
            
         
     
 
         
        
 
    
    
        
            
                  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 4941 users visited in the last hour
         
    
    
        
    
    
 
If I may ask, for what need?
@majeedaasim please choose the accept answer option if it works for you, It will help us motivated. Good Luck!