Question: Filtering reads on the basis of percentage of ambiguous(N's) characters
0
gravatar for Varun Gupta
13 months ago by
Varun Gupta1.1k
United States
Varun Gupta1.1k wrote:

Hi,

I want to filter my fastq reads based on ambiguous characters, basically N's. So if I have a read sequence having N's greater than 5%, I want to discard that read, if lower than 5%, I want to keep it. It would be really helpful if someone know about any tool which already does that.

Thanks

rna-seq fastq filter • 415 views
ADD COMMENTlink modified 13 months ago by Pierre Lindenbaum121k • written 13 months ago by Varun Gupta1.1k
1
gravatar for Pierre Lindenbaum
13 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

using paste + awk:

gunzip -c input.fq.gz |\
 paste - - - - |\
awk -F '\t' '{S=$2;L=1.0*length(S);gsub(/[^ATGCatgc]/,"",S);L2=length(S); if(L2/L > 0.05) print $0;}' |\
tr "\t" "\n"
ADD COMMENTlink written 13 months ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1563 users visited in the last hour