seqtk subsample fastq file question
1
0
Entering edit mode
9 weeks ago
Jjbox ▴ 40

Hi biostars,

I have a long read fastq file. It has 57,523,865 read counts, and trying to subsample using seqtk, but it gives a zero read counts. Can someone help with this issue?

wc -l ALL1807_RW0588_051220_LiveGuppy.fastq
230095460 ALL1807_RW0588_051220_LiveGuppy.fastq


Here is the seqtk command line I used.

./seqtk sample ~/ALL1807_RW0588_051220_LiveGuppy.fastq 19473944 > ALL1807_RW0588_051220_LiveGuppy_sub.fastq


When I count the mean read length, it gives a decimal point. Does having a decimal point make sense? I have never seen a mean read length with a decimal point. Can this be the issue why seqtk subsample is not working?

awk '{if(NR%4==2) {count++; bases += length} } END{print bases/count}' ALL1807_RW0588_051220_LiveGuppy.fastq
869.649

seqtk subsample • 288 views
0
Entering edit mode

Solved! Thanks

0
Entering edit mode
9 weeks ago
michael • 0

You can also check out https://github.com/mbhall88/rasusa for subsampling reads. It was originally designed with long reads in mind. It will also allow you to subsample to a number of bases or coverage if that is more what you're after rather than number of reads.