Question

Help with preparing sequences for structure prediction

0

Entering edit mode

8.1 years ago

Ali HEBRA • 0

Hi everybody, I got stock in a problem with preparing my sequences for I-TASSER run, i tried many times again and again with different languages BASH,php, visualbasic and till now i didn't solve this problem, independent to way i chose for solution i got "empty files in folders" ! Suppose that we have an input file like :

>tr|Y8CHY3   MQEYISNRQEEMLKLIETLVNIDSGSGNKAGVDRIGSLLKREYEKIGFNIDVVH
>tr|G9MJA5   MNLDHYIEELKTLVNVDCGTRTVAGVETVAGIIETLWQREGWHTERVNLGDKV

and then we should put every sequences with related sequence identifier as folder names for example results/Y8CHY3 and in it , the file seq.txt containing

>Y8CHY3
MQEYISNRQEEMLKLIETLVNIDSGSGNKAGVDRIGSLLKREYEKIGFNIDVVH

Please help me, any ideas that can help will be appreciated...

Sample bash script that doesn't work:

while read line do cd results; mkdir $line; 
echo ">Sequence">$line/seq.fasta;
#echo ">$line">$line/seq.fasta; grep '$line' sequences.txt | awk {'print $2'}>>$line/seq.fasta; 
#cd ..; done < seq_names.txt

Sample php script that doesn't work:

while (list ($id,$iden,$seqs) = mysql_fetch_row ($showcat))
{
    echo $iden . " processing...";
    mkdir($id, 0755);
    $seqfile = $id . "/" . "seq.txt";
    $myfile = fopen($seqfile, "w") or die("Unable to open file!");
    $txt = "> " . $iden . "\n"; fwrite($seqfile, $txt);
    $txt = $seqs . "\n";
    fwrite($seqfile, $txt);
    fclose($myfile);
}

Help me please, i spend five days with no answer and now on all of my hope is your kindness to look and solve this Thanks in advance

php itasser structure prediction bash batch • 1.9k views

ADD COMMENT • link updated 8.1 years ago by Ram 43k • written 8.1 years ago by Ali HEBRA • 0

score 1 · Answer 1 · 2016-03-21

1

Entering edit mode

8.1 years ago

Ram 43k

It was a tiny bit challenging, but I found that awk works best for this:

cat seq_file | tr -s " " | awk -F " " '{ seq_name=substr($1,5,length($1)); 
system("mkdir -p results/"seq_name"; echo \">"seq_name"\">results/"seq_name"/seq.txt; 
echo "$2">>results/"seq_name"/seq.txt")}'

It does the following:

runs line by line through seq, separating columns in each line by white space
assigns the seq_name from the seq header, skipping the first 4 characters
creates a folder based on the seq_name within a results folder and write the seq_name and the sequence into a seq.txt within the folder.

I've tested it, but you may have to tweak it based on oddities in the rest of your input file. Remove all new lines from the command before running it.

EDIT-1: Could've used bioawk to parse the sequence elements. Bioawk expects newline separation between name and sequence, so a tr " " "\n" or an equivalent sed after the squeeze operation would be in order.

ADD COMMENT • link 8.1 years ago by Ram 43k

0

Entering edit mode

Thanks Ram for helpful and immediate answer, I've just tried this piece of code and encounter this error: awk: line 2: runaway string constant "/seq.txt; ...

what should i've do with this to work properly?!

ADD REPLY • link 8.1 years ago by Ali HEBRA • 0

0

Entering edit mode

Make sure you did not omit a double quote by accident. This should work on any machine with GNU binaries for the programs involved.

ADD REPLY • link 8.1 years ago by Ram 43k