Take A Subset Of A Fastq Paired-End Sample
1
0
Entering edit mode
9.4 years ago
dfernan ▴ 710

Hi,

I have two paired-end fastq compressed files coming from HiSeq RNA-SEq experiment, ie., pair.1.fastq.gz and pair.2.fastq.gz.

The files are very large so I wanted to just take a few million/thousand reads from each of them (by their respective pairs) and use that file for trying/debuuging purposes.

The results should be two paired-end files, i.e., pair.test.1.fastq.gz and pair.test.2.fastq.gz.

I'd be happy to hear some suggestions on how to do this or hear about tools available, thanks!

fastq illumina paired-end reads rnaseq rna-seq • 12k views
1
Entering edit mode
0
Entering edit mode

3
Entering edit mode
9.4 years ago
Rahul Sharma ▴ 650

HI,

Assuming that the reads are in same order in both of the files. I would do like this:

$zcat pair.1.fastq.gz | sed -n 1,4000000p > pair_1_millions.fastq$ zcat pair.2.fastq.gz | sed -n 1,4000000p > pair_2_millions.fastq


Thanks, Rahul

0
Entering edit mode

Hi, thanks a lot, however, i am not sure if the reads are in the same order, I'd like to add that i am pairing them correctly...