Question: extract subset of sequence
0
gravatar for mail4malla
21 days ago by
mail4malla0
mail4malla0 wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

rna-seq fastq • 218 views
ADD COMMENTlink modified 20 days ago by Matt Shirley7.7k • written 21 days ago by mail4malla0

How to randamly extract reads from a FASTQ file?

or

zcat file.fastq.gz | head -4*#readYouWant > new.fastq
ADD REPLYlink modified 21 days ago • written 21 days ago by noeD60
2
gravatar for cpad0112
21 days ago by
cpad01121.7k
cpad01121.7k wrote:

To extract first 250000 reads from xyz.fastq.gz (assuming that the said file has more than 250000 reads):

seqkit head -n 250000 xyz.fastq.gz > ouput.fq

Download seqkit from here. To count records in the output:

seqkit seq -n output.fq | wc -l

Output should be 250000.

ADD COMMENTlink modified 21 days ago • written 21 days ago by cpad01121.7k
1

just seqkit stats xx.fq.gz for counting.

it also support write gzipped file with -o out.fq.gz

ADD REPLYlink written 20 days ago by shenwei3563.2k

thanks @shenwei356

ADD REPLYlink written 20 days ago by cpad01121.7k
1
gravatar for genomax
21 days ago by
genomax32k
United States
genomax32k wrote:

Using reformat.sh from BBMap suite. reformat.sh in=original.fq.gz out=sampled.fq.gz additional_parameter_below

reads=-1                Set to a positive number to only process this many INPUT reads (or pairs), then quit.
skipreads=-1            Skip (discard) this many INPUT reads before processing the rest.
samplerate=1            Randomly output only this fraction of reads; 1 means sampling is disabled.
sampleseed=-1           Set to a positive number to use that prng seed for sampling (allowing deterministic sampling).
samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.
samplebasestarget=0     (sbt) Exact number of OUTPUT bases desired.
ADD COMMENTlink modified 21 days ago • written 21 days ago by genomax32k
0
gravatar for Pierre Lindenbaum
21 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum96k wrote:

Hi, How can I extract first 1 million lines or lets say first 250,000 reads (small rna) from xyz.fastq.gz file and export as a new file?

gunzip -c in.fq.gz | head -n 1000000 | gzip > out.fq.gz
ADD COMMENTlink written 21 days ago by Pierre Lindenbaum96k

pigz is recommended for faster speed.

ADD REPLYlink written 20 days ago by shenwei3563.2k

elegant:) one has to know there are exactly 4 lines for each read

ADD REPLYlink written 20 days ago by grant.hovhannisyan120

t first 1 million lines or lets say first 250,000 reads (small rna)

ADD REPLYlink written 20 days ago by Pierre Lindenbaum96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1453 users visited in the last hour