seqtk subsample fastq file question
1
0
Entering edit mode
15 months ago
shinyjj ▴ 50

Hi biostars,

I have a long read fastq file. It has 57,523,865 read counts, and trying to subsample using seqtk, but it gives a zero read counts. Can someone help with this issue?

wc -l ALL1807_RW0588_051220_LiveGuppy.fastq
230095460 ALL1807_RW0588_051220_LiveGuppy.fastq

Here is the seqtk command line I used.

./seqtk sample ~/ALL1807_RW0588_051220_LiveGuppy.fastq 19473944 > ALL1807_RW0588_051220_LiveGuppy_sub.fastq

When I count the mean read length, it gives a decimal point. Does having a decimal point make sense? I have never seen a mean read length with a decimal point. Can this be the issue why seqtk subsample is not working?

awk '{if(NR%4==2) {count++; bases += length} } END{print bases/count}' ALL1807_RW0588_051220_LiveGuppy.fastq
869.649
seqtk subsample • 777 views
ADD COMMENT
0
Entering edit mode

Solved! Thanks

ADD REPLY
0
Entering edit mode
15 months ago
michael ▴ 10

You can also check out https://github.com/mbhall88/rasusa for subsampling reads. It was originally designed with long reads in mind. It will also allow you to subsample to a number of bases or coverage if that is more what you're after rather than number of reads.

ADD COMMENT

Login before adding your answer.

Traffic: 2946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6