Question: Identify adapter sequences for trimming from Illumina paired end fastq files
0
gravatar for mohammedtoufiq91
5 weeks ago by
mohammedtoufiq9130 wrote:

Hi,

I am working with the Illumina paired end unaligned data. I would like to initially identify the adapter sequences present in the data, and trim the reads accordingly. Is there are a way to identify the adapter sequences. Please assist me with this and let me know the tools to use.

Thank you, Toufiq

ADD COMMENTlink modified 5 weeks ago by benformatics650 • written 5 weeks ago by mohammedtoufiq9130
1
gravatar for genomax
5 weeks ago by
genomax64k
United States
genomax64k wrote:

Use BBMap suite (reproduced from here) :

If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):

bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

You can find the adapter sequence used in the adapters.fa file included with BBMap. In that case, you can do this:

bbduk.sh in1=r1.fq in2=r2.fq k=23 ref=adapters.fa stats=stats.txt

stats.txt will then list the names of adapter sequences found, and their frequency.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by genomax64k

Thank you. I was able to identify the adapters in R1.fq and R2.fq. Now, I would like to know if these are 5' forward/reverse or 3' forward/reverse. Is there are way to identify.

ADD REPLYlink written 5 weeks ago by mohammedtoufiq9130
0
gravatar for benformatics
5 weeks ago by
benformatics650
ETH Zurich
benformatics650 wrote:

fastp is a new tool that is almost as fast as bbduk but has implemented methods that automatically detect 5' or 3' adapters for both paired (must be manually enabled) and single-end data.

the adapters are evaluated by analyzing the tails of first ~1M reads

So if you have more complicated or multiple adapters this may not be ideal.

ADD COMMENTlink written 5 weeks ago by benformatics650

Thank you. I ran this program, however, did not find any specific adapter.

./fastp -i <input1> -I <input2> -o R1.fastq.gz -O R2.fastq.gz --disable_adapter_trimming --detect_adapter_for_pe --html Report_sample.html

In the .html file, this only reports Duplication rate Insert size estimaion Before/after filtering read quality Before/after filtering base content Before/after kmer counting

ADD REPLYlink written 5 weeks ago by mohammedtoufiq9130
1

If you use "--disable_adapter_trimming" then it does not search for adapters...

ADD REPLYlink written 21 days ago by benformatics650

Thank you. Another question, is it recommended to trim the adapters for the Illumina Paired end data with 150*2 bp

ADD REPLYlink written 21 days ago by mohammedtoufiq9130
2

If they are present they should be trimmed especially if you are going to do any de novo work with your data.

ADD REPLYlink written 21 days ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1921 users visited in the last hour