Merge Fasta According The Beginning Of The Name
3
0
Entering edit mode
9.6 years ago
PoGibas 4.9k

I have thousands of .fa files in a directory. Names are:

ch14.fa_20452_206652_15-84.fa
ch14.fa_20452_206652_786-14.fa
ch14.fa_20452_206652_77-85.fa
ch14.fa_20452_206652_81-78.fa
ch2.fa_16903_17204-41-44.fa
ch2.fa_16903_17204-2-46.fa
ch2.fa_16903_17204-61-47.fa
ch2.fa_16903_17204-73-52.fa

I want to merge files with the similar beginning (ch14.ta_20452_206652 or ch2.ta_16903_17204) into one fasta.

Tried to do it manually with:

cat *ch14.fa_20452_206652* > ch14.fa_20452_206652.fa

But it's impossible and driving me crazy. Hope that someone could help me

merge • 2.4k views
ADD COMMENT
3
Entering edit mode
9.6 years ago

try this:

for f in prefix*.fa ; do
  cat $f >> out.fa
done
ADD COMMENT
3
Entering edit mode
9.6 years ago

I am assuming the question is about how you can concatenate all the same prefix file names at once instead of typing a cat command one by one?

Try this extremely dirty python script:

import os,popen2,sys

prefices = dict([(x.split('-')[0],'') for x in popen2.popen3('ls')[0].read().strip().split('\n') if x != sys.argv[0]]).keys()

for prefix in prefices:
    cmd = 'cat ' + prefix + "* > " + prefix + ".fa"
    os.system(cmd)

Put it in the directory with the fasta files and run it. I would maybe test it out first on a smaller set of fastas. The script defines prefix by anything before the '-' character.

ADD COMMENT
2
Entering edit mode
9.6 years ago
Malcolm.Cook ★ 1.3k

The following uses a perl 1-liner to distribute its standard input into files named systematically after the names of the input files.

# let's put the results in a new subdir.
mkdir merged
perl -wspe 'if ($. eq 1){($o=$ARGV); eval(q{$o=~}.${new}); open(STDOUT,q{>>},$o)}; close ARGV if eof' -- -new='s/(ch\d+.fa_\d+_\d+)_.*/merged\/$1.fa/' *.fa

should put your results in files named, in this case

ch14.fa_20452_206652.fa
ch2.fa_16903_17204.fa
ADD COMMENT

Login before adding your answer.

Traffic: 3399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6