Question: Identify adapter sequences for trimming from Illumina paired end fastq files
0
gravatar for mohammedtoufiq91
7 months ago by
mohammedtoufiq9150 wrote:

Hi,

I am working with the Illumina paired end unaligned data. I would like to initially identify the adapter sequences present in the data, and trim the reads accordingly. Is there are a way to identify the adapter sequences. Please assist me with this and let me know the tools to use.

Thank you, Toufiq

rna-seq qc fastq adapter trimming • 1.0k views
ADD COMMENTlink modified 7 months ago by benformatics1.1k • written 7 months ago by mohammedtoufiq9150
1
gravatar for genomax
7 months ago by
genomax71k
United States
genomax71k wrote:

Use BBMap suite (reproduced from here) :

If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):

bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

You can find the adapter sequence used in the adapters.fa file included with BBMap. In that case, you can do this:

bbduk.sh in1=r1.fq in2=r2.fq k=23 ref=adapters.fa stats=stats.txt

stats.txt will then list the names of adapter sequences found, and their frequency.

ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax71k

Thank you. I was able to identify the adapters in R1.fq and R2.fq. Now, I would like to know if these are 5' forward/reverse or 3' forward/reverse. Is there are way to identify.

ADD REPLYlink written 7 months ago by mohammedtoufiq9150
0
gravatar for benformatics
7 months ago by
benformatics1.1k
ETH Zurich
benformatics1.1k wrote:

fastp is a new tool that is almost as fast as bbduk but has implemented methods that automatically detect 5' or 3' adapters for both paired (must be manually enabled) and single-end data.

the adapters are evaluated by analyzing the tails of first ~1M reads

So if you have more complicated or multiple adapters this may not be ideal.

ADD COMMENTlink written 7 months ago by benformatics1.1k

Thank you. I ran this program, however, did not find any specific adapter.

./fastp -i <input1> -I <input2> -o R1.fastq.gz -O R2.fastq.gz --disable_adapter_trimming --detect_adapter_for_pe --html Report_sample.html

In the .html file, this only reports Duplication rate Insert size estimaion Before/after filtering read quality Before/after filtering base content Before/after kmer counting

ADD REPLYlink written 7 months ago by mohammedtoufiq9150
1

If you use "--disable_adapter_trimming" then it does not search for adapters...

ADD REPLYlink written 6 months ago by benformatics1.1k

Thank you. Another question, is it recommended to trim the adapters for the Illumina Paired end data with 150*2 bp

ADD REPLYlink written 6 months ago by mohammedtoufiq9150
2

If they are present they should be trimmed especially if you are going to do any de novo work with your data.

ADD REPLYlink written 6 months ago by genomax71k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2014 users visited in the last hour