Adaptor sequences in GIAB samples
0
0
Entering edit mode
21 months ago
Mk ▴ 10

Hello,

I am trying to learn DNA sequencing analysis by using GIAB datasets [Link to Chinese trio - HG005_NA24631_son/], the readme file they had provided does not provide the adapter sequences used. I tried to use bbmerge.sh like in this biostars post ;

bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

and it identified the following as adapter sequences in the input file.

Read1_adapter

AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTT

Read2_adapter

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

This matches with the Illumina Truseq adapter;

Read 1

AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Read 2

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

I have tried both these sequences for adapter trimming via cutadapt

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -m 20 -q 30 -o HG005_R1_trimmed.fastq.gz -p HG005_R2_trimmed.fastq.gz MPHG005_S4_L004_R1_001.fastq.gz MPHG005_S4_L004_R2_001.fastq.gz

But still the adapter content fails in FASTQC.

enter image description here

Since, the readme mentioned being prepared using Nextera Mate Pair Sample Preparation Kit, I am now trying using Nextera Mate pair adapter sequence from Illumina support docs.

CTGTCTCTTATACACATCT

AGATGTGTATAAGAGACAG

If the nextera sequence works, doesn't that mean bbmerge.sh approach is wrong or am I doing something wrong with cutadapt. Any help would be appreciated.

illumina GIAB adaptor cutadapt • 702 views
ADD COMMENT
0
Entering edit mode

BBMerge approach is not wrong. You identified and removed the primary adapter. FastQC "failures" are not a reason to stop nor is there a reason to get a "pass" on each FastQC category. Aligners will take care of any extraneous sequence remaining at this point since they will soft-clip those parts of the reads that do not match the reference.

Note: You could have also used bbduk.sh from BBTools to do the same trimming operation.

ADD REPLY
0
Entering edit mode

Hi GenoMax, thanks for the reply. I ran cutadapt and fastqc again on the same sample now with Nextera Mate pair adapters and the FASTQC passed all checks including adapter content. enter image description here

I am curious (and very confused) to know which adapter sequences to use for trimming in this case, as bbmerge.sh seemed to suggest Illumina Truseq adapter whereas in the readme it says Nextera Mate pair sequencing and the Nextera Mate pair adapter sequence resulted in passing FASTQC adapter content check?

I used cutadapt as it was used in the tutorial I was following.

Thanks in advance!

ADD REPLY

Login before adding your answer.

Traffic: 790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6