Question: extract subset of sequence
0
gravatar for mail4malla
11 weeks ago by
mail4malla0
mail4malla0 wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

rna-seq fastq • 272 views
ADD COMMENTlink modified 11 weeks ago by Matt Shirley7.9k • written 11 weeks ago by mail4malla0

How to randamly extract reads from a FASTQ file?

or

zcat file.fastq.gz | head -4*#readYouWant > new.fastq
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by noeD70
2
gravatar for cpad0112
11 weeks ago by
cpad01122.3k
cpad01122.3k wrote:

To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):

seqkit head -n 250000 xyz.fastq.gz > ouput.fq

Download seqkit from here. To count records in the output:

seqkit seq -n output.fq | wc -l

Output should be 250000.

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by cpad01122.3k
1

just seqkit stats xx.fq.gz for counting.

it also support write gzipped file with -o out.fq.gz

ADD REPLYlink written 11 weeks ago by shenwei3563.4k

thanks @shenwei356

ADD REPLYlink written 11 weeks ago by cpad01122.3k
1
gravatar for genomax
11 weeks ago by
genomax34k
United States
genomax34k wrote:

Using reformat.sh from BBMap suite. reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by genomax34k
0
gravatar for Pierre Lindenbaum
11 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum99k wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz
ADD COMMENTlink written 11 weeks ago by Pierre Lindenbaum99k

pigz is recommended for faster speed.

ADD REPLYlink written 11 weeks ago by shenwei3563.4k

elegant:) one has to know there are exactly 4 lines for each read

ADD REPLYlink written 11 weeks ago by grant.hovhannisyan260

t first 1 million lines or lets say first 250,000 reads (small rna)

ADD REPLYlink written 11 weeks ago by Pierre Lindenbaum99k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour