Question

Reduce the number of PE reads by half

0

Entering edit mode

7.7 years ago

BioGeek ▴ 170

I would like to reduce the number of PE reads by half (and keep both in two different files). Is there any quick way to achieve it?

Assembly NGS PE Reads • 1.5k views

ADD COMMENT • link updated 7.7 years ago by igor 13k • written 7.7 years ago by BioGeek ▴ 170

0

Entering edit mode

is this a random sampling of 50% of the reads? Or a 'down-the-middle' split?

ADD REPLY • link 7.7 years ago by st.ph.n ★ 2.7k

1

Entering edit mode

if all you want is the 'first 50% of the reads' in the file without random sampling, you can (1) count the number of reads in the fastq: cat your.fastq | echo $((wc -l/4)) (2) divide the number of reads by 2 (3) multiply this number by 4 to get the number of lines you need, and then (4) head -n #lines to get the first 50% of the sequences you need. (6) use tail to get the bottom 50%

ADD REPLY • link 7.7 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

you can use seqtk sample function

ADD REPLY • link 7.7 years ago by Prasad ★ 1.6k

0

Entering edit mode

If you know Python, you can use HTSeq for subsampling, but to get the other half would half to follow @genomax2's suggestion of find the reads by header that didn't end up in your subsamples files. Here's the example on seqanswers.

ADD REPLY • link 7.7 years ago by st.ph.n ★ 2.7k

1

Entering edit mode

7.7 years ago

igor 13k

A few options: Selecting Random Pairs From Fastq?

ADD COMMENT • link 7.7 years ago by igor 13k

score 3 · Accepted Answer · 2016-08-16

3

Entering edit mode

7.7 years ago

GenoMax 141k

reformat.sh from BBMap.

reformat.sh in1=read1.fq.gz in2=read2.fq.gz out1=new1.fq.gz out2=new2.fq.gz samplerate=0.5

ADD COMMENT • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks for your reply. I guess, it extract the reads "randomly". Now, how to extract the remaining 50% ?

ADD REPLY • link 7.7 years ago by BioGeek ▴ 170

1

Entering edit mode

I think you will need to grab the ID's of reads that got selected in first round and then use filterbyname.sh from BBMap to get the rest in separate files.

ADD REPLY • link 7.7 years ago by GenoMax 141k