Keep FastQC homopolymers or switch to fastp auto-detection?
1
0
Entering edit mode
11 days ago
Abieskawa • 0

Hi everyone, I’m improving my adapter-trimming pipeline and want to confirm if my current setup makes sense.

Current command:

cutadapt \
  -a AGATCGGAAGAG -a AAAAAAAAAAAA -a GGGGGGGGGGGG \
  -A AGATCGGAAGAG -A AAAAAAAAAAAA -A GGGGGGGGGGGG \
  -j 128 -m 5 -Q 20 -q 20 -o R1.trim.fq.gz -p R2.trim.fq.gz R1.fq.gz R2.fq.gz

I include poly-A and poly-G because FastQC reports them. My thought is: if they aren’t real adapters, paired-end alignment (5' ends were not touched) should still recover any useful sequence. I used to think fastp cannot clean all fastQC see, such as AAAAAAAAAAAA, but now I think it might be cause of different target length of A in fastp. Besides, fastp can not parallel over 16 threads.

Questions:

  1. Is it safe to include AAAAAAAAAAAA / GGGGGGGGGGGG, or does that risk over-trimming real poly(A) tails?
  2. should I move to fastp with automatic adapter detection? Is it necessary to change cutadapt to fastp?

Example fastp call for comparison:

fastp -i R1.fq.gz -I R2.fq.gz -o R1.trim.fq.gz -O R2.trim.fq.gz \
      --detect_adapter_for_pe 

Thanks for any advice!

Adapter • 258 views
ADD COMMENT
3
Entering edit mode
11 days ago
GenoMax 154k

Is it safe to include AAAAAAAAAAAA / GGGGGGGGGGGG, or does that risk over-trimming real poly(A) tails?

Homopolymers are not because of adapters. In case of poly-A they can be from mRNA tails. As for poly-G, those may be due to "no signal" in two color chemistries being equated as a G call.

If you want to keep the poly-A tails (for what ever reason) then don't trim those. Poly-G on other hand are likely the "no signal" calls that can be removed.

should I move to fastp with automatic adapter detection? Is it necessary to change cutadapt to fastp?

That is a personal preference. Either program, with right command line options, should be able to deal with adapters/homo-polymers etc.

Besides, fastp can not parallel over 16 threads.

That is not true. You need to add the following command line option

-w, --thread                       worker thread number, default is 3 

Use -w 16. That said, unless you have high performance storage available using that high a number may actually slow things down.

ADD COMMENT

Login before adding your answer.

Traffic: 2772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6