Question: Filtering reads on the basis of percentage of ambiguous(N's) characters
0
gravatar for Varun Gupta
9 months ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi,

I want to filter my fastq reads based on ambiguous characters, basically N's. So if I have a read sequence having N's greater than 5%, I want to discard that read, if lower than 5%, I want to keep it. It would be really helpful if someone know about any tool which already does that.

Thanks

rna-seq fastq filter • 303 views
ADD COMMENTlink modified 9 months ago by Pierre Lindenbaum117k • written 9 months ago by Varun Gupta1.1k
1
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:

using paste + awk:

gunzip -c input.fq.gz |\
 paste - - - - |\
awk -F '\t' '{S=$2;L=1.0*length(S);gsub(/[^ATGCatgc]/,"",S);L2=length(S); if(L2/L > 0.05) print $0;}' |\
tr "\t" "\n"
ADD COMMENTlink written 9 months ago by Pierre Lindenbaum117k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1316 users visited in the last hour