Hello,
I am trying to learn DNA sequencing analysis by using GIAB datasets [Link to Chinese trio - HG005_NA24631_son/], the readme file they had provided does not provide the adapter sequences used. I tried to use bbmerge.sh
like in this biostars post ;
bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa
and it identified the following as adapter sequences in the input file.
Read1_adapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTT
Read2_adapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
This matches with the Illumina Truseq adapter;
Read 1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Read 2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
I have tried both these sequences for adapter trimming via cutadapt
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -m 20 -q 30 -o HG005_R1_trimmed.fastq.gz -p HG005_R2_trimmed.fastq.gz MPHG005_S4_L004_R1_001.fastq.gz MPHG005_S4_L004_R2_001.fastq.gz
But still the adapter content fails in FASTQC.
Since, the readme mentioned being prepared using Nextera Mate Pair Sample Preparation Kit, I am now trying using Nextera Mate pair adapter sequence from Illumina support docs.
CTGTCTCTTATACACATCT
AGATGTGTATAAGAGACAG
If the nextera sequence works, doesn't that mean bbmerge.sh
approach is wrong or am I doing something wrong with cutadapt. Any help would be appreciated.
BBMerge approach is not wrong. You identified and removed the primary adapter. FastQC "failures" are not a reason to stop nor is there a reason to get a "pass" on each FastQC category. Aligners will take care of any extraneous sequence remaining at this point since they will soft-clip those parts of the reads that do not match the reference.
Note: You could have also used
bbduk.sh
from BBTools to do the same trimming operation.Hi GenoMax, thanks for the reply. I ran cutadapt and fastqc again on the same sample now with Nextera Mate pair adapters and the FASTQC passed all checks including adapter content.
I am curious (and very confused) to know which adapter sequences to use for trimming in this case, as
bbmerge.sh
seemed to suggest Illumina Truseq adapter whereas in the readme it says Nextera Mate pair sequencing and the Nextera Mate pair adapter sequence resulted in passing FASTQC adapter content check?I used cutadapt as it was used in the tutorial I was following.
Thanks in advance!