What is the format of a fastq.gz file? And what is the frequency of read in the file? Actually I am trying to find the read lengths in a fastq.gz file then calculate the mean read length. What I do is
zcat dataset015.fastq.gz | head -50| awk ' { s+=1; sum += length($_); printf length($_) " , " }END{avg = sum/s; }'
This belongs to a nanopore dataset and it prints
163 , 29 , 1 , 29 , 163 , 43 , 1 , 43 , 171 , 1034 , 1 , 1034 , 163 , 70 , 1 , 70 , 162 , 295 , 1 , 295 , 163 , 1270 , 1 , 1270 , 163 , 61 , 1 , 61 , 162 , 18 , 1 , 18 , 171 , 973 , 1 , 973 , 170 , 489 , 1 , 489 , 169 , 2203 , 1 , 2203 , 170 , 741 , 1 , 741 , 171 , 9799
However, I know some of them is not a read but other info, so how can I select only the read lengths not any other info field?