How To Use Cutadapt Effectively
1
1
Entering edit mode
8.7 years ago
roll ▴ 330

Hi, I am trying to implement cutadapt. This is the first time i am using the software.

i used fastQC and identified some primers. I have set up the parameters and run the cutadapt to remove these primers before mapping. With the output from cutadapt, I re-run fastQC and this time I get another set of primers.

I used -e 0.01. Does being strict with the error rate affect this?

Am I doing something wrong? Or does this mean I need to rerun the cutadapt until fastQC does not find ant primer/adapter?

primer qc • 6.2k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

FastQC is a generic tool that was not designed to find and report all enriched sequence patterns. It has some internal heuristics that decide just what to report, for details see this:

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/10%20Overrepresented%20Sequences.html

after you cut part of your sequences the method above could produce a different output.

As for a solution you can just keep cutting adapters/primers until there are none - but be careful not to overdo it, verify that what you are cutting is indeed an artificial construct. You could have enrichment for biological reasons.

ADD COMMENT
0
Entering edit mode

Thanks Istvan. I will do that.

Is there any other tool that reports the contamination sequence? what is the best way to identify these in my data?

What about error rate? Does it effect the adapter trimming? I was very strict with the error rate (0.01) which does not allow any error. My sequences are around 50bp. Do you think is it better to use 0.05 or even higher?

ADD REPLY
1
Entering edit mode

Nothing comes to mind as an straight-forward alternative, I would ask this as a separate entry. For example "how to detect contamination or sample preparation artifacts in sequencing results".

as for adapter trimming I would suggest to go with the defaults (0.1) unless you have good reason not to, adapters are designed to not match any known sequence moreover to have a pretty large edit distance from known sequences, thus say a 10% difference will still pretty much indicate an adapter rather than data with known biological origin

ADD REPLY
0
Entering edit mode

To answer your first question, there is a tool from the same group called FastQ Screen that can be used for detecting contamination. The "best way" probably depends on exactly you want to identify (and how precisely).

ADD REPLY
0
Entering edit mode

Thanks. I start using this and now it complaints about libraries. No search libraries were configured at /Downloads/fastq_screen_v0.4.2/fastq_screen line 119. I assume it refers to the contamination list. I have the cocntaminant file from the fastQC but it is not clear to me where to mention this list. I use the following line in my conf file. DATABASE Adapters /Volumes/bowtie2_index/ Can you please advise if by library it means contamination list and where i should mention the libraries?

ADD REPLY
0
Entering edit mode

Post this as a separate question, there are too many things to address as a comment.

ADD REPLY

Login before adding your answer.

Traffic: 2301 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6