extract subset of sequence
3
0
Entering edit mode
6.7 years ago
Björn ▴ 110

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

RNA-Seq fastq • 4.5k views
ADD COMMENT
0
Entering edit mode

How to randamly extract reads from a FASTQ file?

or

zcat file.fastq.gz | head -4*#readYouWant > new.fastq
ADD REPLY
2
Entering edit mode
6.7 years ago

To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):

seqkit head -n 250000 xyz.fastq.gz > ouput.fq

Download seqkit from here. To count records in the output:

seqkit seq -n output.fq | wc -l

Output should be 250000.

ADD COMMENT
1
Entering edit mode

just seqkit stats xx.fq.gz for counting.

it also support write gzipped file with -o out.fq.gz

ADD REPLY
0
Entering edit mode

thanks @shenwei356

ADD REPLY
1
Entering edit mode
6.7 years ago
GenoMax 141k

Using reformat.sh from BBMap suite. reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENT
0
Entering edit mode
6.7 years ago

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz
ADD COMMENT
0
Entering edit mode

pigz is recommended for faster speed.

ADD REPLY
0
Entering edit mode

elegant:) one has to know there are exactly 4 lines for each read

ADD REPLY
0
Entering edit mode

t first 1 million lines or lets say first 250,000 reads (small rna)

ADD REPLY

Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6