Question: extract subset of sequence
0
gravatar for mail4malla
4 months ago by
mail4malla0
mail4malla0 wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

rna-seq fastq • 315 views
ADD COMMENTlink modified 4 months ago by Matt Shirley8.0k • written 4 months ago by mail4malla0

How to randamly extract reads from a FASTQ file?

or

zcat file.fastq.gz | head -4*#readYouWant > new.fastq
ADD REPLYlink modified 4 months ago • written 4 months ago by noeD70
2
gravatar for cpad0112
4 months ago by
cpad01123.8k
cpad01123.8k wrote:

To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):

seqkit head -n 250000 xyz.fastq.gz > ouput.fq

Download seqkit from here. To count records in the output:

seqkit seq -n output.fq | wc -l

Output should be 250000.

ADD COMMENTlink modified 4 months ago • written 4 months ago by cpad01123.8k
1

just seqkit stats xx.fq.gz for counting.

it also support write gzipped file with -o out.fq.gz

ADD REPLYlink written 4 months ago by shenwei3563.4k

thanks @shenwei356

ADD REPLYlink written 4 months ago by cpad01123.8k
1
gravatar for genomax
4 months ago by
genomax39k
United States
genomax39k wrote:

Using reformat.sh from BBMap suite. reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENTlink modified 4 months ago • written 4 months ago by genomax39k
0
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum102k wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz
ADD COMMENTlink written 4 months ago by Pierre Lindenbaum102k

pigz is recommended for faster speed.

ADD REPLYlink written 4 months ago by shenwei3563.4k

elegant:) one has to know there are exactly 4 lines for each read

ADD REPLYlink written 4 months ago by grant.hovhannisyan300

t first 1 million lines or lets say first 250,000 reads (small rna)

ADD REPLYlink written 4 months ago by Pierre Lindenbaum102k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 646 users visited in the last hour