How to subsample from multiple fastq files using seqtk command
1
0
Entering edit mode
10 weeks ago

I have multiple fastq files like this:

TK273759_R1.fastq
TK273760_R1.fastq
TK273761_R1.fastq
TK273762_R1.fastq
TK273763_R1.fastq

I used this command to subsample 500000 reads from those files:

seqtk sample -s100 not_bear_dna/{i}_R1.fastq 500000 > subsample/{i}_R1.fastq

I get this error:

[E::stk_sample] failed to open the input file/stream.

I am not sure what I am doing wrong here, it works for a single fastq file but not for all. Here's my command when I subsample from only one fastq file:

sample -s100 not_bear_dna/TK273759_R1.fastq 500000 > subsample/TK273759_R1.fastq

I appreciate your kind help!

seqtk • 492 views
ADD COMMENT
0
Entering edit mode

You have not shown us your full command - it looks like the seqtk command is part of a loop.

ADD REPLY
0
Entering edit mode

This is the full command I used:

seqtk sample -s100 not_bear_dna/{i}_R1.fastq 500000 > subsample/{i}_R1.fastq
ADD REPLY
0
Entering edit mode

If it's outside a loop, $i probably does not have the value you think it does. Please do not run commands blindly, understand what each part of it does.

ADD REPLY
0
Entering edit mode

Got it now. Thank you.

ADD REPLY
2
Entering edit mode
10 weeks ago
GenoMax 141k

Are you doing something like:

for i in `ls *_R1.fastq`; do seqtk sample ${i} 500000 > subsample/${i}  ; done
ADD COMMENT
0
Entering edit mode

It worked. Thank you so much!

ADD REPLY
0
Entering edit mode

I've moved GenoMax's comment to an answer. Please accept it to provide closure to the post

ADD REPLY

Login before adding your answer.

Traffic: 2854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6