I have a fastq file (RNAseq) and filtered the linkers. now the sequences in the file have different length. I want to remove the reads with shorter than 21 nucleotide and use the rest of the reads. do you know any toll to do that?
How did you remove the adapters (linkers)? I hope you used an established tool like Cutadapt. These tools have in-built options to discard reads shorter a given threshold.
You can use fastaparse.pl script available in mirdeep2 package.
Does that script work with fastq format files? OP is specifically asking about that format.
Filtering Fastq Sequences Based On Lengths
Try with seqkit:
seqkit seq -m 21 in.fq/in.fastq
Use the following tool from BBMap suite. reformat.sh in=your_fq.gz out=filt.fq.gz minlength=21. (Note: If you have paired-end data you will need to use in1= in2= and out1= out2=).
reformat.sh in=your_fq.gz out=filt.fq.gz minlength=21
Another great option is fastp:
fastp --length_required 30 -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz
You may include --detect_adapter_for_pe if adapters are still there, --compression, --thread, and --html for a report.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy