Question

Adapter percentages in the reads

0

Entering edit mode

6.4 years ago

Tania ▴ 180

I am still newbie to bioinformatics. I used trimgalore with path to cutadapt to autodetect the adapters in my reads. Around 60% of the reads only have adapters. Is this percentage normal or low or high? I don't understand this too much to judge.

Thank you

RNA-Seq Trimgalore cutadapt • 4.0k views

ADD COMMENT • link updated 6.4 years ago by GenoMax 141k • written 6.4 years ago by Tania ▴ 180

1

Entering edit mode

It depends on the type of experiment you're doing. If you're not looking at something like small RNAs, then it indicates that the average insert length of the fragmented DNA is shorter than the length of the reads.

ADD REPLY • link 6.4 years ago by mastal511 ★ 2.1k

score 1 · Answer 1 · 2017-11-12

1

Entering edit mode

6.4 years ago

ATpoint 81k

You cannot really put an answer to that. You pick up adapters if the read length is greater than the insert size of your DNA fragment. It basically does not matter how high the percentage is, you simply have to trim them away (you must, this is not optional, as adapters will interfere with proper alignment). Cutadapt is reasonable choice. I personally like Skewer, as it is multithreaded.

EDIT: This question is of course related to your previous postenter link description here, from which I assume that you have rather short fragments. What is the assay, smallRNA-seq or something similar? If so, the result is totally expected.

ADD COMMENT • link 6.4 years ago by ATpoint 81k

0

Entering edit mode

Its RNAseq for human cells. The read length is 150 each (from FASTQ files) and insert size average is ~158 as estimated by BBMap. So I think the reads are overlapping with ~135 and I also think this is a short insert size.

ADD REPLY • link 6.4 years ago by Tania ▴ 180

1

Entering edit mode

Then it is totally expected, as 158 is probably the mean or median, but insert size typically has a kind of Gaussian shape. Trim the adapters with a tool of your choice, align them and continue with the downstream. Nothing wrong with your samples as far as I can tell.

ADD REPLY • link 6.4 years ago by ATpoint 81k

score 0 · Answer 2 · 2017-11-12

0

Entering edit mode

6.4 years ago

GenoMax 141k

Around 60% of the reads only have adapters.

If that number represents mainly adapter dimers (with no recognizable sequence from your sample) then yes that number is very high. Having some % of adapter bases in otherwise normal reads is acceptable. Can you post the bbduk.sh summary of trimming this dataset?

ADD COMMENT • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

This is when I ran using the original fastq files before trimming

Initial:
Memory: max=28502m, free=27313m, used=1189m

Added 0 kmers; time:    0.026 seconds.
Memory: max=28502m, free=26867m, used=1635m

Input is being processed as paired
Started output streams: 0.232 seconds.
Processing time:        80.446 seconds.

Input:                      71125188 reads      10739903388 bases.
Contaminants:               0 reads (0.00%)     0 bases (0.00%)
Total Removed:              0 reads (0.00%)     0 bases (0.00%)
Result:                     71125188 reads (100.00%)    10739903388 bases (100.00%)

Time:               80.731 seconds.
Reads Processed:      71125k    881.02k reads/sec
Bases Processed:      10739m    133.03m bases/sec

This is the output when I ran using reads after trimming using trimgalore cutadapt.

BBDuk version 37.62
Initial:
Memory: max=28517m, free=27475m, used=1042m

Added 0 kmers; time:    0.021 seconds.
Memory: max=28517m, free=26731m, used=1786m

Input is being processed as paired
Started output streams: 0.890 seconds.
Processing time:        69.394 seconds.

Input:                      70857900 reads      9549185421 bases.
Contaminants:               0 reads (0.00%)     0 bases (0.00%)
Total Removed:              0 reads (0.00%)     0 bases (0.00%)
Result:                     70857900 reads (100.00%)    9549185421 bases (100.00%)

Time:               70.324 seconds.
Reads Processed:      70857k    1007.60k reads/sec
Bases Processed:       9549m    135.79m bases/sec

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

Hmm. Can you provide the exact command line (you can remove file/path names) you are using for bbduk.sh? It looks like nothing is being trimmed even from original data. So you either do not have adapters in your data or you are not executing bbduk.sh right.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Sorry it was my fault, still new to stuff. I corrected it. This is my command:

./bbduk.sh -Xmx1g in1=SPR1_001.fastq.gz in2=SPR2_001.fastq.gz out=out1.fq ref=resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 stats=stat.txt

and this is the output of bbduk on original reads:

BBDuk version 37.62
maskMiddle was disabled because useShortKmers=true
Initial:
Memory: max=1029m, free=988m, used=41m

Added 216529 kmers; time:   0.146 seconds.
Memory: max=1029m, free=934m, used=95m

Input is being processed as paired
Started output streams: 0.194 seconds.
Processing time:        76.198 seconds.

Input:                      71125188 reads      10739903388 bases.
KTrimmed:                   25750395 reads (36.20%)     924619793 bases (8.61%)
Total Removed:              56998 reads (0.08%)     924619793 bases (8.61%)
Result:                     71068190 reads (99.92%)     9815283595 bases (91.39%)

Time:               76.677 seconds.
Reads Processed:      71125k    927.59k reads/sec
Bases Processed:      10739m    140.07m bases/sec

And this is when I run the same command but on the reads that were already trimmed by cutadapt/trimgalore

BBDuk version 37.62
maskMiddle was disabled because useShortKmers=true
Initial:
Memory: max=1029m, free=993m, used=36m

Added 216529 kmers; time:   0.170 seconds.
Memory: max=1029m, free=940m, used=89m

Input is being processed as paired
Started output streams: 0.205 seconds.
Processing time:        65.514 seconds.

Input:                      70857900 reads      9549185421 bases.
KTrimmed:                   187297 reads (0.26%)    5036344 bases (0.05%)
Total Removed:              4958 reads (0.01%)  5036344 bases (0.05%)
Result:                     70852942 reads (99.99%)     9544149077 bases (99.95%)

Time:               65.920 seconds.
Reads Processed:      70857k    1074.91k reads/sec
Bases Processed:       9549m    144.86m bases/sec

Does this mean trimgalore/cutadapt did not take off all adaptors and I should repeat?

ADD REPLY • link 6.4 years ago by Tania ▴ 180

0

Entering edit mode

Does this mean trimgalore/cutadapt did not take off everything?

That is possible. bbduk.sh is a sensitive scan/trim program. You don't need to use both programs. Either use trim galore or bbduk. If you are going to using bbmap to align the data then I like to stay with the same suite. But you are free to choose any program you like.

Since you have paired-end data you may want to include tbe tpo options with bbduk.sh. I see that you have used only one output file (with 2 inputs) so the resulting trimmed file will be interleaved. Not all tools are able to use interleaved files so you may want to keep that in mind (otherwise use out1= out2= if you re-run bbduk.sh. In any case you don't have 60% adapters in your data (more ~9% of the bases). Which is the good news.