Question: Trim adapter of undefined length
0
gravatar for julia.mir
25 days ago by
julia.mir0
julia.mir0 wrote:

Hi all,

We are interested in a way of trimming adapters of unknown length from short reads.

The adapter is composed of a FIXED part, which will always be at the beginning of the reads, and a long sequence ADAPTER which won't be necessarily present nor complete.


An example would be the following:

Adapter sequence: FIXEDADAPTER

Some reads:

>a    
FIXEDADAPTERmysequence
>b    
FIXEDmysequence
>c    
FIXEDADAPmysequence

All of the reads should be trimmed with only "mysequence" remaining.


We have evaluated the performance of cutadapt, and fastx but none of them seem to include an option that takes this situation into account. Do you have any idea of the best way to approach this?

Any help would be really appreciated.

JĂșlia

short read trimming adapter • 133 views
ADD COMMENTlink modified 24 days ago by genomax74k • written 25 days ago by julia.mir0
1

Cutadapt can be used defining the minimal overlap. Also, your example is more like this, isn't?:

>a    
FIXEDADAPTERmysequence
>b    
APTERmysequence
>c    
EDADAPTERmysequence
ADD REPLYlink written 25 days ago by JC9.1k

what about pandaseq?

ADD REPLYlink written 25 days ago by Gabriel R.2.6k

Look into bbduk.sh and this option. A guide is available here.

restrictleft=0      If positive, only look for kmer matches in the 
                    leftmost X bases.
ADD REPLYlink written 25 days ago by genomax74k
2
gravatar for genomax
24 days ago by
genomax74k
United States
genomax74k wrote:

Using bbduk.sh from BBMap suite.

Ignore the fastq file contents below. I chose a random one at hand.

$ more test.fq
@cluster_8:UMI_CTTTGA
TATCCTTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CCTTGCAATACTCTCCGAACGGGAGAGCATC
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TGCAATACTCTCCGAACGGGAGAGCATCTTT
+
1/04.72,(003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
TATCGTGCAATACTCTCCGAACGGGAGAGC
+
1/04.72,(003,-2-22+00-12./.-.4

$ more adap.fa (this is the adapter we are searching for)
>test
TATCCTTGCAATACT

$ bbduk.sh in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq

java -ea -Xmx1400m -Xms1400m -cp bbmap/current/ jgi.BBDukF in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq
Executing jgi.BBDukF [in=test.fq, ref=adap.fa, ktrim=l, k=9, out=stdout.fq]
Version 38.26

0.028 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1438m, used=30m

Added 7 kmers; time:    0.028 seconds.
Memory: max=1468m, total=1468m, free=1433m, used=35m

Input is being processed as unpaired
Started output streams: 0.010 seconds.

@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
>=2.660/?:36AD;0<14703640334-//
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATC
+
,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATCTTT
+
003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
2-22+00-12./.-.4
Processing time:        0.005 seconds.

Input:                      6 reads         185 bases.
KTrimmed:                   4 reads (66.67%)    50 bases (27.03%)
Total Removed:              0 reads (0.00%)     50 bases (27.03%)
Result:                     6 reads (100.00%)   135 bases (72.97%)

Time:                           0.045 seconds.
Reads Processed:           6    0.13k reads/sec
Bases Processed:         185    0.00m bases/sec
ADD COMMENTlink modified 24 days ago • written 24 days ago by genomax74k

I stand corrected :-)

ADD REPLYlink written 24 days ago by Carambakaracho1.9k

Thank you very much for your answer. bbduk and ktrim=l option did the trick.

ADD REPLYlink written 24 days ago by julia.mir0
0
gravatar for Carambakaracho
25 days ago by
Carambakaracho1.9k
Switzerland/Basel
Carambakaracho1.9k wrote:

Classic adapter contamination is looks more like what JC describes - your case is not handled out of the box, I believe not even by bbduk, the swiss knife of adapter trimming. However, a pragmatic solution would be an iterative approach (be aware of the pseudo code)

for fq in fq_files_to_trim
    trim fixed <fq >fq_wo_fixedpart
    trim adapter <fq_wo_fixedpart >clean.fq
ADD COMMENTlink written 25 days ago by Carambakaracho1.9k
1

It should be doable with bbduk.sh. One can do it with ktrim=l. I will have to test it to confirm.

ADD REPLYlink written 25 days ago by genomax74k

I'd be thrilled to know, too. To me the variable length adapter between the fixed and sequence part should pose a major challenge to out of the box adapter trimming strategies, including the one of bbduk.

ADD REPLYlink modified 25 days ago • written 25 days ago by Carambakaracho1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 959 users visited in the last hour