Question: trim_galore and cutadapt to paired-end DNA sequencing
0
gravatar for Shicheng Guo
3.5 years ago by
Shicheng Guo7.5k
Shicheng Guo7.5k wrote:

Hi my colleagues,

 

Question 1. trim the adapters when we known them. 

Suppose the adapter for my DNA sequencing are as the following:

P5 adaptor: 5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT
P7 adaptor:  5' P-GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG  

We can trim the adapter for the raw fastq files with trim_galore or cutadapt as the following:

cutadapt -a AGATCGGAAGAGCA -g GCTCTTCCGATCT -o sample.trim.fastq sample.raw.fastq 
trim_galore --paired -a GATCGGAAGAGCA -a2 GCTCTTCCGATCT --retain_unpaired  --trim1  S1.read1.fq S1.read2.fq

However, I am little confused that why 610,514 reads containing of "GATCGGAAGAGCA" can be found in read2.fastq??

BTW: GATCGGAAGAGCA is the reverse complementary of GACGCTCTTCCGATCT

Any suggestion?

 

Question 2. trim the adapters when they are unknow.

Is there any violent and forcible​ method to remove the reads containing the adapters? check each adapter (illumina have hundreds adapters)? because I think these adapters should be not contained by human genome, isn't it? therefore, if the reads contain such adapter sequence, they should be filter out. beat me beat me beat me!!

 

 

cutadapt trim_galore • 4.2k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Shicheng Guo7.5k

> BTW: GATCGGAAGAGCA is the reverse complementary of GACGCTCTTCCGATCT

No it's not. That's part of your problem, you're not passing the right sequences as arguments.  

ADD REPLYlink written 3.5 years ago by Lemire390
1
gravatar for Lemire
3.5 years ago by
Lemire390
Canada
Lemire390 wrote:

My first comment is that you should upgrade to a recent version of cutadapt, which supports both read1.fq and read2.fd in the command line, along with options -a and -A. 

Second, I am not convinced that you should expect to see the sequence GCTCTTCCGATCT in your reads. You have to reverse complement it.  Take for example the TruSeq example in 

http://cutadapt.readthedocs.org/en/stable/guide.html#illumina-truseq

you'll see that the reverse complement of the Universal Adapter is provided with the -A argument. You have to give cutadapt the actual sequences that you expect to see in your reads. 

Third, in AGATCGGAAGAGCA, where does the last A come from?

ADD COMMENTlink written 3.5 years ago by Lemire390
0
gravatar for Brian Bushnell
3.5 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

So...   what's the surprise?  And why are you just giving the tool partial adapter sequences instead of complete sequences?

ADD COMMENTlink written 3.5 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour