Question: Take A Subset Of A Fastq Paired-End Sample
0
gravatar for dfernan
6.8 years ago by
dfernan660
United States
dfernan660 wrote:

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

ADD COMMENTlink modified 6.8 years ago by Rahul Sharma600 • written 6.8 years ago by dfernan660
1

duplicate of

Selecting random pairs from fastq?

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum125k

thanks Pierre, I didn't realize someone else asked about it!

ADD REPLYlink written 6.8 years ago by dfernan660
3
gravatar for Rahul Sharma
6.8 years ago by
Rahul Sharma600
Germany
Rahul Sharma600 wrote:

HI,

Assuming that the reads are in same order in both of the files. I would do like this:

$ zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq
$ zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq

Thanks, Rahul

ADD COMMENTlink written 6.8 years ago by Rahul Sharma600

Hi, thanks a lot, however, i am not sure if the reads are in the same order, I'd like to add that i am pairing them correctly...

ADD REPLYlink written 6.8 years ago by dfernan660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 887 users visited in the last hour