Hi,
I'm using BBDuk to trim adapters and quality-filter reads of WGS data from ancient DNA (>200 years old samples).
The reads are 150bp PE.
When I use the recommended parameters for PE reads (which I've used many times before), a large amount of my reads is being trimmed by ktrim=r
(25-50%).
This is my command:
bbduk.sh -Xmx1g ref=$BBMAP_DIR/bbmap-38.79-0/resources/adapters.fa ktrim=r k=23 pigz=f mink=11 hdist=1 qtrim=rl trimq=10 tpe tbo int minlen=30 ziplevel=9 threads=12 in=./D15_#.fastq.gz out=trimmed_reads/trimmed_D15_#.fastq.gz stats=D15.stats ow
And this is the output:
Input: 93969406 reads 14189380306 bases.
QTrimmed: 393623 reads (0.42%) 1688764 bases (0.01%)
KTrimmed: 90265990 reads (96.06%) 6997869744 bases (49.32%)
Trimmed by overlap: 970534 reads (1.03%) 4869518 bases (0.03%)
Total Removed: 347220 reads (0.37%) 7004428026 bases (49.36%)
Result: 93622186 reads (99.63%) 7184952280 bases (50.64%)
I suspect that this might be due to the fragmented nature of the aDNA, resulting in short fragments, flanked by adapter sequences, but I'd like to have a second opinion, to make sure that I don't need to alter the parameters somehow to retain more "real" sequences.
Many thanks, Ido