I am trying to learn DNA sequencing analysis by using GIAB datasets [Link to Chinese trio - HG005_NA24631_son/], the readme file they had provided does not provide the adapter sequences used. I tried to use
bbmerge.sh like in this biostars post ;
bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa
and it identified the following as adapter sequences in the input file.
This matches with the Illumina Truseq adapter;
I have tried both these sequences for adapter trimming via cutadapt
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -m 20 -q 30 -o HG005_R1_trimmed.fastq.gz -p HG005_R2_trimmed.fastq.gz MPHG005_S4_L004_R1_001.fastq.gz MPHG005_S4_L004_R2_001.fastq.gz
But still the adapter content fails in FASTQC.
Since, the readme mentioned being prepared using Nextera Mate Pair Sample Preparation Kit, I am now trying using Nextera Mate pair adapter sequence from Illumina support docs.
If the nextera sequence works, doesn't that mean
bbmerge.sh approach is wrong or am I doing something wrong with cutadapt. Any help would be appreciated.