Hi there,
I am doing adapter contamination trimming using Cutadapt on my RNA-seq data (paired-end).
here is part of the report (--discard-trimmed
was set),
I am wondering, how confident the trimming is as the dominant majority of the removed sequence are less than 10 bps, most of which are either 3 or 4 bps. Is that normal? Would it have caused a relative huge lost on true seqs as it is hard to be sure that 3-4bps are indeed coming from adapter even though the frequency is much higher than expected? Should one actually set a min_length of removed sequence to avoid this (did not find such an option in Cutadapt)?
another question, in total nearly 6% raw reads were filtered out when parameter --pair-filter=any
was set, whereas when --pair-filter=both
only 0.1% reads were out. I am thinking maybe --pair-filter=both
is more convincing and should be used since when sequencing approached beyond 3' end that is supposed to happen on both reads. Am I right?