FASTQC can't detect short overrepresented sequences (e.g., less than 25 bp)?
1
2
Entering edit mode
10.1 years ago
Cacau ▴ 520

I have tried to use FASTQC to detect short overrepresented sequences at the 5' end of my reads. I am sure that there are adaptors but no results are given as over-represented sequences in FASTQC results. Is it possible to detect these sequences by using FASTQC? If not, what other software may I use for this purpose?

RNA-Seq • 4.0k views
ADD COMMENT
1
Entering edit mode

you can use cutadapt to remove those sequences

ADD REPLY
0
Entering edit mode

Thanks for your help. In case that I don't know what these adapters are, do you have any suggestions of how I should detect them?

ADD REPLY
0
Entering edit mode

the common set of sequence which is present in almost all of your reads, should be adapter if your sequence doesn't have barcodes.

better would be to ask the person who prepared library

ADD REPLY
1
Entering edit mode
10.1 years ago
rtliu ★ 2.2k
  1. Check if you are using the latest FastQC version v0.11.2, can you find the Adapter Content?
  2. For overrepresented sequences part of manual:

    This module lists all of the sequence which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 200,000 sequences are tracked to the end of the file. It is therefore possible that a sequence which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module.

  3. Try seqtk to subsample your fastq, say 1 million reads, then test the result with FastQC again.

ADD COMMENT
0
Entering edit mode

It still didn't detect adapters. I tried to create a test file containing 1000 sequences with adapter sequences at the beginning of each sequence (GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG). FASTQC did not detect overrepresented sequences. When I added additional 10 C to the pseudo adapter sequences, the adapter sequences were detected.

ADD REPLY
0
Entering edit mode

As far as I know, Illumina adapters are most likely to be presented at 3', aka, towards the end of each sequence.

Try Kraken to infer adapters from fastq data. To infer 3' adapter sequence: minion search-adapter -i data.fq.gz

ADD REPLY
0
Entering edit mode

Thanks for your help! It helped me a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6