Question: How To Use Cutadapt Effectively
1
gravatar for roll
5.6 years ago by
roll270
United Kingdom
roll270 wrote:

Hi, I am trying to implement cutadapt. This is the first time i am using the software.

i used fastQC and identified some primers. I have set up the parameters and run the cutadapt to remove these primers before mapping. With the output from cutadapt, I re-run fastQC and this time I get another set of primers.

I used -e 0.01. Does being strict with the error rate affect this?

Am I doing something wrong? Or does this mean I need to rerun the cutadapt until fastQC does not find ant primer/adapter?

qc primer • 4.6k views
ADD COMMENTlink modified 4.3 years ago by Biostar ♦♦ 20 • written 5.6 years ago by roll270
1
gravatar for Istvan Albert
5.6 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

FastQC is a generic tool that was not designed to find and report all enriched sequence patterns. It has some internal heuristics that decide just what to report, for details see this:

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/10%20Overrepresented%20Sequences.html

after you cut part of your sequences the method above could produce a different output.

As for a solution you can just keep cutting adapters/primers until there are none - but be careful not to overdo it, verify that what you are cutting is indeed an artificial construct. You could have enrichment for biological reasons.

ADD COMMENTlink written 5.6 years ago by Istvan Albert ♦♦ 80k

Thanks Istvan. I will do that.

Is there any other tool that reports the contamination sequence? what is the best way to identify these in my data?

What about error rate? Does it effect the adapter trimming? I was very strict with the error rate (0.01) which does not allow any error. My sequences are around 50bp. Do you think is it better to use 0.05 or even higher?

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by roll270
1

Nothing comes to mind as an straight-forward alternative, I would ask this as a separate entry. For example "how to detect contamination or sample preparation artifacts in sequencing results".

as for adapter trimming I would suggest to go with the defaults (0.1) unless you have good reason not to, adapters are designed to not match any known sequence moreover to have a pretty large edit distance from known sequences, thus say a 10% difference will still pretty much indicate an adapter rather than data with known biological origin

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Istvan Albert ♦♦ 80k

To answer your first question, there is a tool from the same group called FastQ Screen that can be used for detecting contamination. The "best way" probably depends on exactly you want to identify (and how precisely).

ADD REPLYlink written 5.6 years ago by SES8.2k

Thanks. I start using this and now it complaints about libraries. No search libraries were configured at /Downloads/fastq_screen_v0.4.2/fastq_screen line 119. I assume it refers to the contamination list. I have the cocntaminant file from the fastQC but it is not clear to me where to mention this list. I use the following line in my conf file. DATABASE Adapters /Volumes/bowtie2_index/ Can you please advise if by library it means contamination list and where i should mention the libraries?

ADD REPLYlink written 5.6 years ago by roll270

Post this as a separate question, there are too many things to address as a comment.

ADD REPLYlink written 5.6 years ago by SES8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1656 users visited in the last hour