How to concatenate gene sequence?
4
0
Entering edit mode
8.4 years ago
buthercup_ch ▴ 30

Hello,

With the aim of doing some phylogenetics based on Multi Locus Sequence Alignment, I need to generate concatenated sequences of several genes.

Does anyone how can I do it quickly by using any unix based command line? If yes, could you please help me and shae with me the program?

Thank you in advance

Example:

If I have 4 fasta files containing the sequence of 4 genes:

genea.fas

>geneA
ATGCCGTAATGCTAGCTAG

geneb.fas

>geneB
GTCGGCCTACGATCGGCCCTTTACG

genec.fas

>geneC
ACGCAGCCTGCGAGCTCA

gened.fas

>geneD
CATCCACTACCTTACGATCG

I need to obtain a final file containing 1 only concatenated sequence

concat.fas

>concat
ATGCCGTAATGCTAGCTAGATGCCGTAATGCTAGCTAGACGCAGCCTGCGAGCTCACATCCACTACCTTACGATCG
genome sequence • 6.0k views
ADD COMMENT
0
Entering edit mode
8.4 years ago
nterhoeven ▴ 120

This should work for you:

cat * | perl -ne 'BEGIN{print ">concat\n"}next if $_=~/^>/; chomp; print $_;' > concat.fa

This line concatenates all files in your directory and passes the output to perl. Then perl prints a header line (>concat) and every line (which does not start with >) without the newline.

Please try it on a test dataset and let me know, if it works for you ;-)

ADD COMMENT
0
Entering edit mode
8.4 years ago
iraun 6.2k

awk approach to combine sequences:

awk '/^>/ {if (c++ == 0) {print;}next}/^$/ {next}{printf "%s", $0}END {print ""}' input.fa > output.fa

The fasta header of the output sequence will be the the header of the first sequence in your input file.

ADD COMMENT
0
Entering edit mode
8.4 years ago
Anima Mundi ★ 2.9k

A Python solution:

concat = ''

j = 0
while j < 4: # That is, the number of files you have.
    for line in open('gene' + chr(j + ord('a')) + '.fas'):
        if '>' not in line:
            concat += line
    j += 1

print '>concat\n' + concat.replace('\n', '')
ADD COMMENT
0
Entering edit mode
8.4 years ago
cat *fas | grep -v '>' | tr '\n' '' > out.fas
ADD COMMENT

Login before adding your answer.

Traffic: 2854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6