Error in Seqtk output
2
1
Entering edit mode
9.2 years ago
Prasad ★ 1.6k

Hi,

i have 2 fastq file (R1 and R2). The problem is R1 has 5 (not per se) sequence and R2 has 6. i want only those reads and its pair from R2 file. so i used seqtk

seqtk sample -s100 test_R1.fastq 5 >seq1

seqtk sample -s100 test_R2.fastq 5 >seq2

but i am not getting exact pairend reads from both file. is there any ready tool which does that

seqtk • 3.0k views
ADD COMMENT
0
Entering edit mode

What do you mean by 'exact pairend reads'? Paired-end reads from Illumina should ideally not overlap at all.

ADD REPLY
1
Entering edit mode
9.2 years ago

It's somewhat difficult to tell exactly what your situation is. If "test_R1.fastq" is out of sync with "test_R2.fastq", then sync them before proceeding (BBTools has a convenient function for that as I recall).

In any case, you can subsample pairs of fastq files with BBTools as well. That's described here: Select sequences from fastq.gz file

ADD COMMENT
1
Entering edit mode
9.2 years ago
SES 8.6k

You cannot sample two files and expect the reads to be paired if they have a different number of sequences. As Devon said, you need to pair the reads first. Here is a lightweight solution using Pairfq:

curl -sL git.io/pairfq_lite | perl - makepairs -f test_R1.fastq -r test_R2.fastq -fp test_R1_p.fastq -rp test_R2_p.fastq -fs test_R1_s.fastq -rs test_R2_s.fastq

Now you can sample the paired files and get what you expect.

seqtk sample -s100 test_R1_p.fastq 5 > test_R1_p_5.fastq
seqtk sample -s100 test_R2_p.fastq 5 > test_R2_p_5.fastq

There is inline documentation for above command so ./pairfq_lite will show you the usage, and there is more information about that specific command on the wiki online. If you have a lot of sequences and little memory, I would recommend installing the program from the link above and using the indexing method (if you use this approach).

ADD COMMENT

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6