awk error, retrieving stuff from primer3 output file
1
0
Entering edit mode
8.1 years ago
sp ▴ 20

I would like to make primer file in fasta format by "greping" some information (Primer template, Primer ID, and left & right primers sequence) from primer3 output as below. Goutham Atla graciously gave me a command (grep -E "PRIMER_RIGHT_0_SEQUENCE|PRIMER_LEFT_0_SEQUENCE|SEQUENCE_ID" test.fasta | paste - - - | awk '{ gsub("\047|,","",$0); print ">"$6"-left\n"$2"\n" ">"$6"-right\n"$4}'​) , and used it well. This time I would like to design 3 pairs of candidate for each target, and modified his command line like this (grep -E "PRIMER_RIGHT_\d_SEQUENCE|PRIMER_LEFT_\d_SEQUENCE|SEQUENCE_ID|SEQUENCE_TEMPLATE" x.out | paste - - - - - - - - | awk '{ gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12}' > xgrep_3primers.out). Of course, it's not working with errors, and I have no clue. Please help me out.

Below is the example of source primer3 output...

{'PRIMER_INTERNAL_NUM_RETURNED': 0L,

...

'PRIMER_LEFT_0_SEQUENCE': 'ATGGCAAATACACAGAGGAAGC',

...

'PRIMER_LEFT_1_SEQUENCE': 'GCAAATACACAGAGGAAGCCTT',

...

'PRIMER_LEFT_2_SEQUENCE': 'TGATGGCAAATACACAGAGGAAG',

...

'PRIMER_RIGHT_0_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',

...

'PRIMER_RIGHT_1_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',

...

'PRIMER_RIGHT_2_SEQUENCE': 'AGATGGTGAAACCTGTTTGTTG',

...

'SEQUENCE_ID': 'chr1:114713809-114714010',

...

'SEQUENCE_TEMPLATE': 'TAATATCCGCAAATGACTTGCTATTATTGATGGCAAATACACAGAGGAAGCCTTCGCCTGTCCTCATGTATTGGTCTCTCATGGCACTGTACTCTTCTTGTCCAGCTGTATCCAGTATGTCCAACAAACAGGTTTCACCATCTATAACCACTTGTTTTCTGTAAGAATCCTGGGGGTGTggagggtaagggggcagggagg'}

None

Desired output format is:

>chr1:114713809-114714010

TAATATCCGCAAATGACTTGCTATTATTGATGGCAAATACACAGAGGAAGCCTTCGCCTGTCCTCATGTATTGGTCTCTCATGGCACTGTACTCTTCTTGTCCAGCTGTATCCAGTATGTCCAACAAACAGGTTTCACCATCTATAACCACTTGTTTTCTGTAAGAATCCTGGGGGTGTggagggtaagggggcagggagg

>chr1:114713809-114714010-L0

ATGGCAAATACACAGAGGAAGC

>chr1:114713809-114714010-L1

GCAAATACACAGAGGAAGCCTT

>chr1:114713809-114714010-L2

TGATGGCAAATACACAGAGGAAG

>chr1:114713809-114714010-R0

AGATGGTGAAACCTGTTTGTTG

>chr1:114713809-114714010-R1

AGATGGTGAAACCTGTTTGTTG

>chr1:114713809-114714010-R2

AGATGGTGAAACCTGTTTGTTG

software error sequence • 1.7k views
ADD COMMENT
0
Entering edit mode

First my doubt is why are using \d instead of 0.

ADD REPLY
0
Entering edit mode

I would like to use regular expression to cover all (Left0, Left1, Left2 etc) the primers. Tried with individual numbers and ends up same error as below:

sp@sp-ThinkPad-X220:~/multiplex_primer_design/2016_03_18_DC_Hot$ grep -E "PRIMER_LEFT_0_SEQUENCE|PRIMER_LEFT_1_SEQUENCE|PRIMER_LEFT_2_SEQUENCE|PRIMER_RIGHT_0_SEQUENCE|PRIMER_RIGHT_1_SEQUENCE|PRIMER_RIGHT_2_SEQUENCE|SEQUENCE_ID|SEQUENCE_TEMPLATE" x.out | paste - - - - - - - - | awk '{ gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12}' > xgrep_3primers.out awk: cmd. line:1: { gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12} awk: cmd. line:1: ^ backslash not last character on line awk: cmd. line:1: { gsub("\047|,","",$0); print ">"$14"\n"$16"\n" ">"$14"-L0"\n"$2"\n" ">"$14"-L1"\n"$4"\n" ">"$14"-L2"\n"$6"\n" ">"$14"-R0"\n"$8"\n" ">"$14"-R1"\n"$10"\n" ">"$14"-R2"\n"$12} awk: cmd. line:1: ^ syntax error

ADD REPLY
2
Entering edit mode
8.1 years ago
gangireddy ▴ 160

Found it

$14"-R2"\n"$

quote missing and this mistake is at many places.

change to

$14"-R2""\n"

I am sure it will work

suggestion:

the following would be better

$14"-R2\n"

do not place two qoutes side by side u can place them in single quote.

The cmd worked is

grep -E PRIMER_LEFT_0_SEQUENCE|PRIMER_LEFT_1_SEQUENCE|PRIMER_LEFT_2_SEQUENCE|PRIMER_RIGHT_0_SEQUENCE|PRIMER_RIGHT_1_SEQUENCE|PRIMER_RIGHT_2_SEQUENCE|SEQUENCE_ID|SEQUENCE_TEMPLATE" x.out | paste - - - - - - - - | awk '{ gsub("\047|,","",$0); print ">"$14"\n"substr($16,0,length($16)-1)"\n>"$14"-L0\n"$2"\n>"$14"-L1\n"$4"\n>"$14"-L2\n"$6"\n>"$14"-R0\n"$8"\n>"$14"-R1\n"$10"\n>"$14"-R2\n"$12}' > xgrep_3primers.out
ADD COMMENT
0
Entering edit mode

That's right! It works now. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6