I've used BBDuk in the past although not with a data sets this large (14-20GB). I've been following the preprocessing guide and after quality trimming, it initially appeared that some of my data sets had become unpaired as their output files were different sizes. I reran the process again and sure enough, the same samples still had R1 and R2 that were different sizes, with R1 being reported as 1GB larger than R2. I then ran vpair repair.sh on the samples which reported the names appeared to be correctly paired and then fastqc which reported an equal number of reads in each file. This has happened to 14 sets of my paired end reads with initial file sizes ranging from 15-20GB. I found it strange that it had only happened to a few samples, so when I changed my view from "ls -lh" to "ls -l --block-size=MB" (which if I'm understanding correctly shows my file size in MB) and it looks like all my files have been affected with all files reporting different file sizes for R1 and R2, but those reporting a 1GB change more so than the others.
So my question is, are my samples still correctly paired? I've pasted my bbduk command below:
for i in `ls -1 *CR_R1.fq | sed 's/CR_R1.fq//'` do bbduk.sh -Xmx28g in1=$i\CR_R1.fq in2=$i\CR_R2.fq out1=$i\QT_R1.fq out2=$i\QT_R2.fq k=31 tpe tbo qtrim=rl trimq=20 maq=20 maxns=0 minlen=50 done