Question: extract subset of sequence
0
gravatar for Björn
11 months ago by
Björn30
Björn30 wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

rna-seq fastq • 479 views
ADD COMMENTlink modified 11 months ago by Matt Shirley8.4k • written 11 months ago by Björn30

How to randamly extract reads from a FASTQ file?

or

zcat file.fastq.gz | head -4*#readYouWant > new.fastq
ADD REPLYlink modified 11 months ago • written 11 months ago by noeD70
2
gravatar for cpad0112
11 months ago by
cpad01127.7k
India
cpad01127.7k wrote:

To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):

seqkit head -n 250000 xyz.fastq.gz > ouput.fq

Download seqkit from here. To count records in the output:

seqkit seq -n output.fq | wc -l

Output should be 250000.

ADD COMMENTlink modified 11 months ago • written 11 months ago by cpad01127.7k
1

just seqkit stats xx.fq.gz for counting.

it also support write gzipped file with -o out.fq.gz

ADD REPLYlink written 11 months ago by shenwei3564.0k

thanks @shenwei356

ADD REPLYlink written 11 months ago by cpad01127.7k
1
gravatar for genomax
11 months ago by
genomax51k
United States
genomax51k wrote:

Using reformat.sh from BBMap suite. reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENTlink modified 11 months ago • written 11 months ago by genomax51k
0
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum110k wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz
ADD COMMENTlink written 11 months ago by Pierre Lindenbaum110k

pigz is recommended for faster speed.

ADD REPLYlink written 11 months ago by shenwei3564.0k

elegant:) one has to know there are exactly 4 lines for each read

ADD REPLYlink written 11 months ago by grant.hovhannisyan900

t first 1 million lines or lets say first 250,000 reads (small rna)

ADD REPLYlink written 11 months ago by Pierre Lindenbaum110k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1380 users visited in the last hour