2
0
Entering edit mode
3.3 years ago
julia.mir • 0

Hi all,

We are interested in a way of trimming adapters of unknown length from short reads.

The adapter is composed of a FIXED part, which will always be at the beginning of the reads, and a long sequence ADAPTER which won't be necessarily present nor complete.

An example would be the following:

>a
>b
FIXEDmysequence
>c


All of the reads should be trimmed with only "mysequence" remaining.

We have evaluated the performance of cutadapt, and fastx but none of them seem to include an option that takes this situation into account. Do you have any idea of the best way to approach this?

Any help would be really appreciated.

Júlia

1
Entering edit mode

Cutadapt can be used defining the minimal overlap. Also, your example is more like this, isn't?:

>a
>b
APTERmysequence
>c

0
Entering edit mode

0
Entering edit mode

Look into bbduk.sh and this option. A guide is available here.

restrictleft=0      If positive, only look for kmer matches in the
leftmost X bases.

2
Entering edit mode
3.3 years ago
GenoMax 125k

Using bbduk.sh from BBMap suite.

Ignore the fastq file contents below. I chose a random one at hand.

$more test.fq @cluster_8:UMI_CTTTGA TATCCTTGCAATACTCTCCGAACGGGAGAGC + 1/04.72,(003,-2-22+00-12./.-.4- @cluster_12:UMI_GGTCAA GCAGTTTAAGATCATTTTATTGAAGAGCAAG + ?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A @cluster_21:UMI_AGAACA GGCATTGCAAAATTTATTACACCCCCAGATC + >=2.660/?:36AD;0<14703640334-// @cluster_8:UMI_CTTTGA CCTTGCAATACTCTCCGAACGGGAGAGCATC + 1/04.72,(003,-2-22+00-12./.-.4- @cluster_8:UMI_CTTTGA TGCAATACTCTCCGAACGGGAGAGCATCTTT + 1/04.72,(003,-2-22+00-12./.-.4- @cluster_8:UMI_CTTTGA TATCGTGCAATACTCTCCGAACGGGAGAGC + 1/04.72,(003,-2-22+00-12./.-.4$ more adap.fa (this is the adapter we are searching for)
>test
TATCCTTGCAATACT

\$ bbduk.sh in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq

java -ea -Xmx1400m -Xms1400m -cp bbmap/current/ jgi.BBDukF in=test.fq ref=adap.fa ktrim=l k=9 out=stdout.fq
Executing jgi.BBDukF [in=test.fq, ref=adap.fa, ktrim=l, k=9, out=stdout.fq]
Version 38.26

0.028 seconds.
Initial:
Memory: max=1468m, total=1468m, free=1438m, used=30m

Added 7 kmers; time:    0.028 seconds.
Memory: max=1468m, total=1468m, free=1433m, used=35m

Input is being processed as unpaired
Started output streams: 0.010 seconds.

@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
-22+00-12./.-.4-
@cluster_12:UMI_GGTCAA
GCAGTTTAAGATCATTTTATTGAAGAGCAAG
+
?7?AEEC@>=1?A?EEEB9ECB?==:B.A?A
@cluster_21:UMI_AGAACA
GGCATTGCAAAATTTATTACACCCCCAGATC
+
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATC
+
,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGCATCTTT
+
003,-2-22+00-12./.-.4-
@cluster_8:UMI_CTTTGA
CTCCGAACGGGAGAGC
+
2-22+00-12./.-.4
Processing time:        0.005 seconds.

KTrimmed:                   4 reads (66.67%)    50 bases (27.03%)
Total Removed:              0 reads (0.00%)     50 bases (27.03%)
Result:                     6 reads (100.00%)   135 bases (72.97%)

Time:                           0.045 seconds.
Bases Processed:         185    0.00m bases/sec

0
Entering edit mode

I stand corrected :-)

0
Entering edit mode

Thank you very much for your answer. bbduk and ktrim=l option did the trick.

0
Entering edit mode
3.3 years ago
Carambakaracho ★ 3.1k

Classic adapter contamination is looks more like what JC describes - your case is not handled out of the box, I believe not even by bbduk, the swiss knife of adapter trimming. However, a pragmatic solution would be an iterative approach (be aware of the pseudo code)

for fq in fq_files_to_trim
trim fixed <fq >fq_wo_fixedpart

1
Entering edit mode

It should be doable with bbduk.sh. One can do it with ktrim=l. I will have to test it to confirm.

0
Entering edit mode

I'd be thrilled to know, too. To me the variable length adapter between the fixed and sequence part should pose a major challenge to out of the box adapter trimming strategies, including the one of bbduk.