Question: Help with preparing sequences for structure prediction
gravatar for Ali HEBRA
3.7 years ago by
Switzerland, Basel
Ali HEBRA0 wrote:

Hi everybody, I got stock in a problem with preparing my sequences for I-TASSER run, i tried many times again and again with different languages BASH,php, visualbasic and till now i didn't solve this problem, independent to way i chose for solution i got "empty files in folders" ! Suppose that we have an input file like :


and then we should put every sequences with related sequence identifier as folder names for example results/Y8CHY3 and in it , the file seq.txt containing


Please help me, any ideas that can help will be appreciated...

Sample bash script that doesn't work:

while read line do cd results; mkdir $line; 
echo ">Sequence">$line/seq.fasta;
#echo ">$line">$line/seq.fasta; grep '$line' sequences.txt | awk {'print $2'}>>$line/seq.fasta; 
#cd ..; done < seq_names.txt

Sample php script that doesn't work:

while (list ($id,$iden,$seqs) = mysql_fetch_row ($showcat))
    echo $iden . " processing...";
    mkdir($id, 0755);
    $seqfile = $id . "/" . "seq.txt";
    $myfile = fopen($seqfile, "w") or die("Unable to open file!");
    $txt = "> " . $iden . "\n"; fwrite($seqfile, $txt);
    $txt = $seqs . "\n";
    fwrite($seqfile, $txt);

Help me please, i spend five days with no answer and now on all of my hope is your kindness to look and solve this Thanks in advance

ADD COMMENTlink modified 3.7 years ago by RamRS24k • written 3.7 years ago by Ali HEBRA0
gravatar for RamRS
3.7 years ago by
Houston, TX
RamRS24k wrote:

It was a tiny bit challenging, but I found that awk works best for this:

cat seq_file | tr -s " " | awk -F " " '{ seq_name=substr($1,5,length($1)); 
system("mkdir -p results/"seq_name"; echo \">"seq_name"\">results/"seq_name"/seq.txt; 
echo "$2">>results/"seq_name"/seq.txt")}'

It does the following:

  1. runs line by line through seq, separating columns in each line by white space
  2. assigns the seq_name from the seq header, skipping the first 4 characters
  3. creates a folder based on the seq_name within a results folder and write the seq_name and the sequence into a seq.txt within the folder.

I've tested it, but you may have to tweak it based on oddities in the rest of your input file. Remove all new lines from the command before running it.

EDIT-1: Could've used bioawk to parse the sequence elements. Bioawk expects newline separation between name and sequence, so a tr " " "\n" or an equivalent sed after the squeeze operation would be in order.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by RamRS24k

Thanks Ram for helpful and immediate answer, I've just tried this piece of code and encounter this error: awk: line 2: runaway string constant "/seq.txt; ...

what should i've do with this to work properly?!

ADD REPLYlink written 3.7 years ago by Ali HEBRA0

Make sure you did not omit a double quote by accident. This should work on any machine with GNU binaries for the programs involved.

ADD REPLYlink written 3.7 years ago by RamRS24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour