Question

Pipe output from find to input of awk

0

Entering edit mode

6.7 years ago

fiona.newberry ▴ 80

I have 20 fastq files (paired end reads) and to add a unique number onto the end of the sequence identifier in the fastq files.

So I want this from genome 1:

simulated.2618103/1

To look like this:

simulated.2618103/1.1

I have an awk command that will do the above:

awk '{ if  (NR%1==4) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

I want a way to find all the genome 1-10 files and execute the awk command so that each fastq file gets the unique identifier.

So genome 1 should have .1 at the end of its sequence identifier, genome 2 should have .2 at the end of its sequence identifier, etc.

I have tried this:

find . -name "sub_NC_001539*" -exec awk ' { if (NR%4==1) gsub("$", ".1", $1); print } '

The problem isnt the awk command. I just don't know how to get find to pipe correctly to awk and to keep the output as paired end reads

Thanks

awk • 4.1k views

ADD COMMENT • link updated 6.7 years ago by Pierre Lindenbaum 161k • written 6.7 years ago by fiona.newberry ▴ 80

0

Entering edit mode

Just a modification to the Pierre's answer, as you also need to have the uniqe ID with in fastq,

var=1
find . -type f  -name "sub_NC_001539*" | while read F
do
awk -v id=${var} ' { if (NR%4==1) gsub("$", "."id, $1); print } ' ${F} > $(dirname ${F})/new_$(basename ${F})
((var+=1))
done

ADD REPLY • link 6.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thank you. Do you mind explaining the code? I am new to coding and don't quite understand that

ADD REPLY • link 6.7 years ago by fiona.newberry ▴ 80

score 2 · Accepted Answer · 2017-08-08

2

Entering edit mode

6.7 years ago

Pierre Lindenbaum 161k

loops.

find . -type f  -name "sub_NC_001539*" | while read F
do
  awk ' { if (NR%4==1) gsub("$", ".1", $1); print } ' ${F} > $(dirname ${F})/new_$(basename ${F})
done