I had a question on clumpify.sh usage
My goal: I am trying to run clumpify.sh as the very 1st step of my RNASeq/WES/WGS pipeline based on these below (as listed by Brian here ) . By doing so, my thinking is that, if I start with clumped reads as Step 1 of the pipeline, the different downstream steps will benefit a lot from the reduced file sizes and possibly speed up the pipeline
- Clumpify has no effect on downstream analysis aside from making it faster
- If you want to clumpify data for compression, do it as early as possible (e.g. on the raw reads). Then run all downstream processing steps ensuring that read order is maintained
I want to ensure that none of my downstream steps in pipeline are affected in anyways. Hence, I was trying out clumpify.sh and comparing fastp results with and without using clumpify.sh.
Case Study 1
fastp on the original reads (no clumpify pre-processing)
Case Study 2
clumpify.sh in1=R1.fastq.gz in2=R2.fastq.gz out1=clumped_R1.fastq.gz out2=clumped_R2.fastq.gz reorder=p followed by fastp on the clumped reads
Observation: When I look at the fastp statistics, there are very minute differences.
Fastp results - Case Study 1
After filtering total reads: 149.178736 M total bases: 15.023535 G Q20 bases: 14.815694 G (98.616568%) Q30 bases: 14.394220 G (95.811144%) GC content: 46.186597% Filtering result reads passed filters: 149.178736 M (95.630291%) reads with low quality: 6.227876 M (3.992349%) reads with too many N: 7.686000 K (0.004927%) reads too short: 580.978000 K (0.372433%)
Fastp results - Case Study 2
After filtering total reads: 149.174956 M total bases: 15.022542 G Q20 bases: 14.814667 G (98.616246%) Q30 bases: 14.393230 G (95.810881%) GC content: 46.186565% Filtering result reads passed filters: 149.174956 M (95.627868%) reads with low quality: 6.228374 M (3.992668%) reads with too many N: 7.688000 K (0.004928%) reads too short: 584.258000 K (0.374536%)
The question: Given the above, should there be something I should be worried downstream and/or lookout for given the minute differences I have laid out above.
Thanks in advance.