Question: Trim adapter of undefined length
0
gravatar for julia.mir
3 months ago by
julia.mir0
julia.mir0 wrote:

Hi all,

We are interested in a way of trimming adapters of unknown length from short reads.

The adapter is composed of a FIXED part, which will always be at the beginning of the reads, and a long sequence ADAPTER which won't be necessarily present nor complete.


An example would be the following:

Adapter sequence: FIXEDADAPTER

Some reads:

>a    
FIXEDADAPTERmysequence
>b    
FIXEDmysequence
>c    
FIXEDADAPmysequence

All of the reads should be trimmed with only "mysequence" remaining.


We have evaluated the performance of cutadapt, and fastx but none of them seem to include an option that takes this situation into account. Do you have any idea of the best way to approach this?

Any help would be really appreciated.

JĂșlia

short read trimming adapter • 206 views
ADD COMMENTlink modified 3 months ago by genomax78k • written 3 months ago by julia.mir0
1

Cutadapt can be used defining the minimal overlap. Also, your example is more like this, isn't?:

>a    
FIXEDADAPTERmysequence
>b    
APTERmysequence
>c    
EDADAPTERmysequence
ADD REPLYlink written 3 months ago by JC9.4k

what about pandaseq?

ADD REPLYlink written 3 months ago by Gabriel R.2.7k

Look into bbduk.sh and this option. A guide is available here.

restrictleft=0      If positive, only look for kmer matches in the 
                    leftmost X bases.
ADD REPLYlink written 3 months ago by genomax78k
2
gravatar for genomax
3 months ago by
genomax78k
United States
genomax78k wrote:

Using bbduk.sh from BBMap suite.

Ignore the fastq file contents below. I chose a random one at hand.

$ more test.fq
@cluster_8:UMI_CTTTGA
TATCCTTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CCTTGCAATACTCTCCGAACGGGAGAGCATC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TGCAATACTCTCCGAACGGGAGAGCATCTTT
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TATCGTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4

$ more adap.fa (this is the adapter we are searching for)
>test
TATCCTTGCAATACT

$ bbduk.sh in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq

java -ea -Xmx1400m -Xms1400m -cp bbmap/current/ jgi.BBDukF in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq
Executing jgi.BBDukF [in=test.fq, ref=adap.fa, ktrim=l, k=9, out=stdout.fq]
Version 38.26

0.028 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1438m, used=30m

Added 7 kmers; time:    0.028 seconds.
Memory: max=1468m, total=1468m, free=1433m, used=35m

Input is being processed as unpaired
Started output streams: 0.010 seconds.

@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATC
+
,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATCTTT
+
003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
2-22+00-12./.-.4
Processing time:        0.005 seconds.

Input:                      6 reads         185 bases.
KTrimmed:                   4 reads (66.67%)    50 bases (27.03%)
Total Removed:              0 reads (0.00%)     50 bases (27.03%)
Result:                     6 reads (100.00%)   135 bases (72.97%)

Time:                           0.045 seconds.
Reads Processed:           6    0.13k reads/sec
Bases Processed:         185    0.00m bases/sec
ADD COMMENTlink modified 3 months ago • written 3 months ago by genomax78k

I stand corrected :-)

ADD REPLYlink written 3 months ago by Carambakaracho2.0k

Thank you very much for your answer. bbduk and ktrim=l option did the trick.

ADD REPLYlink written 3 months ago by julia.mir0
0
gravatar for Carambakaracho
3 months ago by
Carambakaracho2.0k
Germany/Cologne
Carambakaracho2.0k wrote:

Classic adapter contamination is looks more like what JC describes - your case is not handled out of the box, I believe not even by bbduk, the swiss knife of adapter trimming. However, a pragmatic solution would be an iterative approach (be aware of the pseudo code)

for fq in fq_files_to_trim
    trim fixed <fq >fq_wo_fixedpart
    trim adapter <fq_wo_fixedpart >clean.fq
ADD COMMENTlink written 3 months ago by Carambakaracho2.0k
1

It should be doable with bbduk.sh. One can do it with ktrim=l. I will have to test it to confirm.

ADD REPLYlink written 3 months ago by genomax78k

I'd be thrilled to know, too. To me the variable length adapter between the fixed and sequence part should pose a major challenge to out of the box adapter trimming strategies, including the one of bbduk.

ADD REPLYlink modified 3 months ago • written 3 months ago by Carambakaracho2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1846 users visited in the last hour