Take A Subset Of A Fastq Paired-End Sample
1
0
Entering edit mode
9.4 years ago
dfernan ▴ 710

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

fastq illumina paired-end reads rnaseq rna-seq • 12k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

thanks Pierre, I didn't realize someone else asked about it!

ADD REPLY
3
Entering edit mode
9.4 years ago
Rahul Sharma ▴ 650

HI,

Assuming that the reads are in same order in both of the files. I would do like this:

$ zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq
$ zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq

Thanks, Rahul

ADD COMMENT
0
Entering edit mode

Hi, thanks a lot, however, i am not sure if the reads are in the same order, I'd like to add that i am pairing them correctly...

ADD REPLY

Login before adding your answer.

Traffic: 1286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6