How can I remove adapter sequence from illumina 2000 paired end data?
6.4 years ago
tcf.hcdg ▴ 70

I have Illumina 2000 paired-end sequencing data. I did quality trimming with fast QC and then remove the adapter sequences (Illumina paired-end adapters) with cutadapt. From the results, I found that only a few reads have adapters. I then check it with trim galore which shows only 0.1% of the reads containing adapter sequences.

I am wondering why only 0.1 % of the sequences containing the adapter sequences.

cutadapt
=== Summary ===

Pairs that were too short: 434,082 (1.4%)
Pairs written (passing filters): 30,547,336 (98.6%)

Total basepairs processed: 15,490,709,000 bp
Quality-trimmed: 466,256,549 bp (3.0%)
Total written (filtered): 14,923,182,261 bp (96.3%)


the result summary of trim_galore

Trim galore
=== Summary ===

Reads written (passing filters): 30,981,418 (100.0%)

Total basepairs processed: 7,745,354,500 bp
Quality-trimmed: 85,966,692 bp (1.1%)
Total written (filtered): 7,659,104,092 bp (98.9%)


Every sequence does not need to have an adapter. In fact you only see adapters in reads that have inserts that are smaller than the number of cycles of sequencing carried out.

Your data may be fine as is.

Do you have adapters in the overrepresented sequences of the FASTQC report ?

I found some of the over-represented sequence but they do not have the paired-end adapter sequence

Those kind of links are not going to work since they point to some file on your local desktop.

Your best bet is to take a screenshot of what you want to show and then upload it to one of the free image hosting sites (you can find them once you press Ctrl+G in biostars message edit window.

6.4 years ago

As Genomax said, only fragments with insert size shorter than read length contain adapter sequence. You can generate an insert size histogram with BBMerge (from the BBMap package) and also determine the actual adapter sequence like this:

bbmerge.sh in1=r1.fastq in2=r2.fastq outa=adapters.fa ihist=ihist.txt


If only 0.1% of the reads have an insert size shorter than read length, adapter-trimming probably went correctly.