Question: How can I remove adapter sequence from illumina 2000 paired end data?
0
gravatar for tcf.hcdg
3.0 years ago by
tcf.hcdg60
European Union
tcf.hcdg60 wrote:

I have Illumina 2000 paired-end sequencing data. I did quality trimming with fast QC and then remove the adapter sequences (Illumina paired-end adapters) with cutadapt. From the results, I found that only a few reads have adapters. I then check it with trim galore which shows only 0.1% of the reads containing adapter sequences.

I am wondering why only 0.1 % of the sequences containing the adapter sequences.

cutadapt
  === Summary ===

  Total read pairs processed: 30,981,418
  Read 1 with adapter: 3,821 (0.0%)
  Read 2 with adapter: 2,104 (0.0%)
  Pairs that were too short: 434,082 (1.4%)
  Pairs written (passing filters): 30,547,336 (98.6%)

  Total basepairs processed: 15,490,709,000 bp
  Read 1: 7,745,354,500 bp
  Read 2: 7,745,354,500 bp
  Quality-trimmed: 466,256,549 bp (3.0%)
  Read 1: 85,966,692 bp
  Read 2: 380,289,857 bp
  Total written (filtered): 14,923,182,261 bp (96.3%)
  Read 1: 7,561,684,003 bp
  Read 2: 7,361,498,258 bp

the result summary of trim_galore

Trim galore
  === Summary ===

  Total reads processed: 30,981,418
  Reads with adapters: 38,498 (0.1%)
  Reads written (passing filters): 30,981,418 (100.0%)

  Total basepairs processed: 7,745,354,500 bp
  Quality-trimmed: 85,966,692 bp (1.1%)
  Total written (filtered): 7,659,104,092 bp (98.9%)

Did I use wrong adapter sequences or the adapters have already been removed after the sequencing?

ADD COMMENTlink modified 3.0 years ago by Brian Bushnell16k • written 3.0 years ago by tcf.hcdg60
3

Every sequence does not need to have an adapter. In fact you only see adapters in reads that have inserts that are smaller than the number of cycles of sequencing carried out.

Your data may be fine as is.

ADD REPLYlink written 3.0 years ago by genomax71k

Do you have adapters in the overrepresented sequences of the FASTQC report ?

ADD REPLYlink written 3.0 years ago by Carlo Yague4.6k

I found some of the over-represented sequence but they do not have the paired-end adapter sequence

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tcf.hcdg60

file:///home/tajammul/PhD_data/Radula_moss/clipped/a2_plus_b2_ATTCCT_L001_R2_001.trimmed_fastqc.html#M9

file:///home/tajammul/PhD_data/Radula_moss/clipped/a2_plus_b2_ATTCCT_L001_R1_001.trimmed_fastqc.html#M9

ADD REPLYlink written 3.0 years ago by tcf.hcdg60

Those kind of links are not going to work since they point to some file on your local desktop.

Your best bet is to take a screenshot of what you want to show and then upload it to one of the free image hosting sites (you can find them once you press Ctrl+G in biostars message edit window.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax71k

http://tinypic.com/view.php?pic=2ngucrc&s=9#.V-KX3tHQPCI

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tcf.hcdg60

So unless you used home-made adapters, your data should be clean. FASTQC automatically detect 'classic' adapters in the overrepresented sequences.

ADD REPLYlink written 3.0 years ago by Carlo Yague4.6k

http://tinypic.com/view.php?pic=2ngucrc&s=9#.V-KX3tHQPCI

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by tcf.hcdg60
0
gravatar for Brian Bushnell
3.0 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

As Genomax said, only fragments with insert size shorter than read length contain adapter sequence. You can generate an insert size histogram with BBMerge (from the BBMap package) and also determine the actual adapter sequence like this:

bbmerge.sh in1=r1.fastq in2=r2.fastq outa=adapters.fa ihist=ihist.txt

If only 0.1% of the reads have an insert size shorter than read length, adapter-trimming probably went correctly.

ADD COMMENTlink written 3.0 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour