To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):
seqkit head -n 250000 xyz.fastq.gz > ouput.fq
Download seqkit from here. To count records in the output:
seqkit seq -n output.fq | wc -l
Output should be 250000.
Using reformat.sh from BBMap suite.
reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below
reads=-1 Set to a positive number to only process this many INPUT reads (or pairs), then quit. skipreads=-1 Skip (discard) this many INPUT reads before processing the rest. samplerate=1 Randomly output only this fraction of reads; 1 means sampling is disabled. sampleseed=-1 Set to a positive number to use that prng seed for sampling (allowing deterministic sampling). samplereadstarget=0 (srt) Exact number of OUTPUT reads (or pairs) desired. samplebasestarget=0 (sbt) Exact number of OUTPUT bases desired.
Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?
gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz