Creating UBA protein data file in shell
1
0
Entering edit mode
4.1 years ago
anabaena ▴ 10

Hey all, I am trying to create a concatenated .faa file for analysis from the UBA genome data set, I downloaded the tar and unpacked it and a folder 'bacteria' was created with sub-folders labeled UBA1330, etc. and in those subfolders are .faa files. What I need to do is concatenate all the protein files into one major file and add the genome ID (e.g. UBA1330) to the >faa_id so I can locate that protein in the correct genome if it is a hit. I am new to shell and have done the following script

for GENOME in 'ls bacteria/';
do
sed "s|.*_|>${GENOME}_|" bacteria/${GENOME}/${GENOME}.faa | cat >> bacteria_proteins.faa; 
done

I recieve the following error:

sed: can't read bacteria/ls: No such file or directory

sed: can't read bacteria/ls: No such file or directory

sed: can't read bacteria.faa: No such file or directory

for some reason it isn't doing the 'ls bacteria/' command correctly and using bacteria as the {GENOME}, yet when I run:

$ls bacteria/

I get the correct output:

UBAXXXX UBAXXXXXX UBAXXXXXX etc.

I'm new to terminal and would love some input on what I am doing wrong. Thanks!

shell metagenomes UBA protein .faa • 653 views
ADD COMMENT
3
Entering edit mode
4.1 years ago

there is a syntax error in your script

this part

for GENOME in 'ls bacteria/';

needs to be:

for GENOME in `ls bacteria/`;

so using the backticks rather than normal single quotes

alternatively you could use the following:

for GENOME in $(ls bacteria/);

is you are in bash environment that is

what you originally had written was just a list of elements being 'ls' and 'bacteria' and what you need is that command to be executed and the result used in your loop.

ADD COMMENT
0
Entering edit mode

Awesome that worked, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6