Question: Take A Subset Of A Fastq Paired-End Sample
0
gravatar for dfernan
6.1 years ago by
dfernan640
United States
dfernan640 wrote:

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

ADD COMMENTlink modified 6.1 years ago by Rahul Sharma560 • written 6.1 years ago by dfernan640
1

duplicate of

Selecting random pairs from fastq?

ADD REPLYlink written 6.1 years ago by Pierre Lindenbaum118k

thanks Pierre, I didn't realize someone else asked about it!

ADD REPLYlink written 6.1 years ago by dfernan640
3
gravatar for Rahul Sharma
6.1 years ago by
Rahul Sharma560
Germany
Rahul Sharma560 wrote:

HI,

Assuming that the reads are in same order in both of the files. I would do like this:

$ zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq
$ zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq

Thanks, Rahul

ADD COMMENTlink written 6.1 years ago by Rahul Sharma560

Hi, thanks a lot, however, i am not sure if the reads are in the same order, I'd like to add that i am pairing them correctly...

ADD REPLYlink written 6.1 years ago by dfernan640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1341 users visited in the last hour