Question: extract reads with given size range
0
gravatar for apelin20
2.8 years ago by
apelin20470
Canada
apelin20470 wrote:

Hello,

I did a smallRNA-Seq experiment where the sequencing provider was supposed to sequence small RNAs 15-25 nt. After trimming adapters, I see that there are reads ranging in size from 15 to 35 bp, but also reads that are 50 bp (the full length of the read). Since I sent total RNA, I assume the 50bp are RNA species other than miRNA/smallRNA. I was to extract all reads <40bp and >49bp in 2 separate files.

The only way I can figure to do this is to convert to fasta, determine the length of each read using a tool, use R to create my 2 lists of reads based on sizes, save the list and use seqtk to extract from fastq. This sounds long and silly.

Any alternatives?

Thanks

smallrna-seq rna-seq fastq • 1.3k views
ADD COMMENTlink modified 2.8 years ago by iraun3.5k • written 2.8 years ago by apelin20470
1

Instead you can directly extract reads which are of particular length.

and most importantly, a previous dicussion: Filtering Fastq Sequences Based On Lengths

ADD REPLYlink written 2.8 years ago by venu6.0k
1

You can also use the BBMap package like this:

reformat.sh in=reads.fq out=filtered.fq minlength=15 maxlength=25

But, you can just as easily do that at the same time as adapter trimming if you use BBDuk, which also supports those flags. Both Reformat and BBDuk are many times faster than prinseq or fastx.

ADD REPLYlink modified 2.8 years ago by genomax64k • written 2.8 years ago by Brian Bushnell16k
1
gravatar for iraun
2.8 years ago by
iraun3.5k
Norway
iraun3.5k wrote:

Have you tried awk solution to parse fastq files?

awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) < 40) {print header, seq, qheader, qseq}}' input.fq > lessthan40bp.fq

awk 'BEGIN {OFS = "\n"} {header = $0 ; getline seq ; getline qheader ; getline qseq ; if (length(seq) > 49) {print header, seq, qheader, qseq}}' input.fq > morethan49bp.fq
ADD COMMENTlink written 2.8 years ago by iraun3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2119 users visited in the last hour