Awk command for fastq read length and number of reads?
1
0
Entering edit mode
6.1 years ago
oars ▴ 200

Hello,

I am trying to do something similar to this old thread (https://www.biostars.org/p/72433/), were I want to determine both the read length and how many reads are in my fastq file. Here is my code:

gunzip -c SRR1060507_1.fastq.gz|awk 'NR%4==2{printlength($0)}'|uniq -c

But I keep getting the following error:

awk: cmd. line:1: (FILENAME=- FNR=2) fatal: function `printlength' not defined

I'm not sure what I've done incorrectly? I also tried Frederic's code from the old thread and although I got that code to run, its not exactly the output I'm seeking, I should be returning something like 2420797 100

Any help would be super appreciated!

awk fastq • 6.8k views
ADD COMMENT
0
Entering edit mode
gunzip -c SRR1060507_1.fastq.gz | awk 'NR%4==2{print length($0)}'

-for length

gunzip -c SRR1060507_1.fastq.gz | awk 'END {print NR/4}'

-for num. of sequences

ADD REPLY
4
Entering edit mode
6.1 years ago
cschu181 ★ 2.8k

Try:

print length($0)
ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. This worked! I'm very new to both bioinformatics and bash so I feel a bit silly but also very thankful for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6