Question

Variable read length distribution after cutadapt running for my ATAC-seq datasets

0

Entering edit mode

5.2 years ago

yaoyao20152031 • 0

Hi all,

I am processing the ATACseq datasets recently, and I've done fastQC before and after adapter trimming using cutadapt v2.5. The ATACseq datasets are downloaded form GEO (GSE119453 ) with 75bp pair-end sequence. The fastqc report shows that the read length is 76 for R1 or R2 before cutadapt running. However the fastqc report shows that the length distribution is 0-76 and the density is wired after trimming ( See shared images of one example of report before and after cutadapt for R1).

I am not sure is there any problem with the adapter removal step. Is there any error for the adapter sequence I pass to cutadapt?

Length before cutadapt:

Length before cutadapt

adapter detect before cutadapt:

adapter detect before cutadapt

Length after cutadapt:

Length after cutadapt

adapter not detect after cutadapt:

adapter not detect after cutadapt

The cutadapt options I used:

cutadapt \
  -a CTGTCTCTTATACACATCTCCGAGCCCACGAGAC \
  -A CTGTCTCTTATACACATCTGACGCTGCCGACGA \
  -o /n/scratch2/yy220/downloaded/ATAC_seq_datasets/2_cudadapt/${prefix}_R1.fastq.gz \
  -p /n/scratch2/yy220/downloaded/ATAC_seq_datasets/2_cudadapt/${prefix}_R2.fastq.gz \
  ${prefix}_OTHER_1.fastq.gz  ${prefix}_OTHER_2.fastq.gz \
  --cores=20 \
  --quality-cutoff 10 \
  -m 20 \
  --pair-filter=both

The first 800 reads which including my adapter for R1:

zcat SRR7784432_GSM3374850_Myeloid_dendritic_cells_sample_1_Homo_sapiens_ATAC-seq_1.fastq.gz | head -n 800 | grep -E CTGTCTCTTATACACATCTCCGAGCCCACGAGAC`

Only four reads are detected:

GCCCCTCCTAGTGGTCTCCATGCTCCCCTCTCATGACCCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAA 
GTGAGAAACGGAGCAGGAGAGCAGGGGGGGAGGCCCCAGACCTGTCTCTTATACACATCTCCGAGCCCACGAGACT   
GTCTCAGCTCACTACAACCTCCCCCTCCCGGCTTCAGGCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAA  
CAGTAGATATCCTTAAACCCATAGTAAGTTCCATAACCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAGG

The sequence of Nextera adapter from illumina website

illumina adapter sequence

cutadapt read-length alignment ATAC-seq • 3.2k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 5.2 years ago by yaoyao20152031 • 0

0

Entering edit mode

This is normal and expected. Fragment length in ATAC-seq is unequal by the nature of the experiment.

ADD REPLY • link 5.2 years ago by ATpoint 88k

0

Entering edit mode

But why there is a peak at 75 bp after cutadapt, that seems many of the reads were not trimmed by cutadapt.

ADD REPLY • link 5.2 years ago by yaoyao20152031 • 0

0

Entering edit mode

Which again is a fine result. That means your reads don't have any adapter or other contamination that you scanned/trimmed for.

ADD REPLY • link 5.2 years ago by GenoMax 152k

0

Entering edit mode

Because these fragments are long enough so that at 75bp read length no adapter content is picked up. This is a totally fine result, it always looks like that in ATAC-seq, I processed dozens of these over the last years.

ADD REPLY • link 5.2 years ago by ATpoint 88k

0

Entering edit mode

Thank you for your help @ATpoint @genomax, It's quite a relief to know that there is no problem with this step. And another question is what alignment tools should I select since the reads length is variable (<50, and > 50 ). I know that BWA Bowtie1 are more sensitive to short reads less than 50 bp, and Bowite2 more to reads greater than 50bp. But how about this situation? There are about 20% reads are less than 50bp.