Hi, I am trying to implement cutadapt. This is the first time i am using the software.
i used fastQC and identified some primers. I have set up the parameters and run the cutadapt to remove these primers before mapping. With the output from cutadapt, I re-run fastQC and this time I get another set of primers.
I used -e 0.01. Does being strict with the error rate affect this?
Am I doing something wrong? Or does this mean I need to rerun the cutadapt until fastQC does not find ant primer/adapter?
Thanks Istvan. I will do that.
Is there any other tool that reports the contamination sequence? what is the best way to identify these in my data?
What about error rate? Does it effect the adapter trimming? I was very strict with the error rate (0.01) which does not allow any error. My sequences are around 50bp. Do you think is it better to use 0.05 or even higher?
Nothing comes to mind as an straight-forward alternative, I would ask this as a separate entry. For example "how to detect contamination or sample preparation artifacts in sequencing results".
as for adapter trimming I would suggest to go with the defaults (0.1) unless you have good reason not to, adapters are designed to not match any known sequence moreover to have a pretty large edit distance from known sequences, thus say a 10% difference will still pretty much indicate an adapter rather than data with known biological origin
To answer your first question, there is a tool from the same group called FastQ Screen that can be used for detecting contamination. The "best way" probably depends on exactly you want to identify (and how precisely).
Thanks. I start using this and now it complaints about libraries. No search libraries were configured at /Downloads/fastq_screen_v0.4.2/fastq_screen line 119. I assume it refers to the contamination list. I have the cocntaminant file from the fastQC but it is not clear to me where to mention this list. I use the following line in my conf file. DATABASE Adapters /Volumes/bowtie2_index/ Can you please advise if by library it means contamination list and where i should mention the libraries?
Post this as a separate question, there are too many things to address as a comment.