I need to calculate the percentage of paired-end fastq files containing the string "TAACCCTAACCCTAACCCTAACCC ". So I used bbduk.sh in1=1.fastq.gz in2=2.fastq.gz literal=TAACCCTAACCCTAACCCTAACCC k=24 mm=f int=f
and I got :
Input: 65975862 reads 6554014910 bases.
Contaminants: 195232 reads (0.30%) 19519262 bases (0.30%)
Total Removed: 1040136 reads (1.58%) 61775988 bases (0.94%)
Result: 64935726 reads (98.42%) 6492238922 bases (99.06%)
Should I consider the total removed (1.58%) as the percent of that string in paired-end fastq files?
In addition, I am using grep, this command : grep -A 2 -B 1 ' TAACCCTAACCCTAACCCTAACCC ' D1_TTAGGC_L001_R1_001.fastq.gz | sed '/--/d' > out_D1_R1.fq.
It gives about 7526 lines containing the string. I divided by total sequences (32987931) to get the percent of the string 7526/32987931= 0.02. Does it mean only forward fastq file have 0.02 of that string?
Thanks