Question: Problem with N sequences in fastqc file
0
gravatar for carina2817
12 months ago by
carina281720
carina281720 wrote:

Hello,

I am trying to filter a fastq file, I ran fastqc to get a quality report and I get an overrepresented sequence:

sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 39317 percentage: 0.13862182817994162

The fastq file has 28362777 sequences and the read length is 125.

I used cutadapt (fastx toolkit) to remove it:

gunzip -c SRR9667734_S_sp_2.fastq.gz |  cutadapt -m 20 -e 0.1 -z -a NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN - -o SRR9667734_S_sp_cutadapt_2.fastq.gz

but the resulting file still has those overrepresented sequences and the number of sequences in the fastq file was reduced to 68122 after running cutadapt.

Overrepresented sequences:

sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 39317 percentage: 57.71556912597986

sequence: ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 1172 percentage: 1.7204427350929214

sequence: GNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 1014 percentage: 1.488505915856845

sequence: CNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 895 percentage: 1.3138193241537244

sequence: TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN count: 864 percentage: 1.268312733037785

Any idea of what's happening?

fastqc fastq • 468 views
ADD COMMENTlink modified 12 months ago by yztxwd380 • written 12 months ago by carina281720

Not answering your question but you can try bbduk.sh from BBMap suite with maxns=-1 If non-negative, reads with more Ns than this (after trimming) will be discarded option to remove reads with N's.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax91k

For starters, maybe put the -o option before the input. And I'm pretty sure cutadapt can handle gzipped files, so no need to decompress.

ADD REPLYlink written 12 months ago by swbarnes28.9k
0
gravatar for yztxwd
12 months ago by
yztxwd380
Southern Medical University
yztxwd380 wrote:

See the documentation about wildcard interpretation in cutadapt: https://cutadapt.readthedocs.io/en/stable/guide.html#wildcards

The right way to remove N in fastq: https://cutadapt.readthedocs.io/en/stable/guide.html#dealing-with-n-bases

ADD COMMENTlink written 12 months ago by yztxwd380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1738 users visited in the last hour