Bowtie not aligning shorter fragments from ATAC-seq
1
0
Entering edit mode
10 months ago

Hi! I'm trying to do the alignment for an ATAC-seq experiment using bowtie2. We did paired-end 100 pb sequencing. By checking with FastQC, of course I have adapter contamination. So I started trimming the reads as follows:

TrimmomaticPE $(DATAPATH)$(FILE1) $(DATAPATH)$(FILE2) $(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output1_unpaired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_unpaired_trimmed.fastq ILLUMINACLIP:/home/ecorrales/Programs/Trimmomatic-0.36/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE TRAILING:10 MINLEN:25


Afterwards I don't see any longer adapter contamination, and only few (0.11%) of the reads are removed. Then I proceeded to do the mapping with bowtie2:

bowtie2 -X 2000 --very-sensitive -x $(GENOMEDIR) -1$(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq -2$(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq -S$(SCRATCH_JOBDIR)/(NAME)_output.sam  I then checked the insert size distribution and there were no inserts smaller than 100 bp. I assumed that this could be related to those fragments where the insert size was smaller than the read length. So I tried trimming the reads to different lengths before mapping. Indeed all the inserts smaller than the length that I trimmed to were not mapped. In the plot below I show the insert size distribution after trimming the reads to 50 bp or 75 bp (or 100 bp for no trimming): I thought that adding the --dovetail parameter to the mapping would do the trick, but nothing changed. Is there any other parameter that I could be missing? alignment • 445 views ADD COMMENT 0 Entering edit mode How are you calculating fragment size? You also may need to add --soft-clipped-unmapped-tlen in addition to --dovetail. ADD REPLY 0 Entering edit mode Thank you for your reply. I tried with --soft-clipped-unmapped-tlen but doesn't seem to change. For calculating the fragment size I used samtools view AS-435485-LR-48138_output.sort.bam | awk '9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[   ]*//' > AS-435485-LR-48138_insertSize.txt

0
Entering edit mode

Have you checked what happens to the reads with small insert sizes when you don't trim?

0
Entering edit mode

I guess the problem might be indeed be in the insert size calculation rather than on the mapping. I initially thought that the small inserts were somehow considered discordant, although I get the following metrics:

  49753 (100.00%) were paired; of these:
1384 (2.78%) aligned concordantly 0 times
24418 (49.08%) aligned concordantly exactly 1 time
23951 (48.14%) aligned concordantly >1 times
----
1384 pairs aligned concordantly 0 times; of these:
53 (3.83%) aligned discordantly 1 time
----
1331 pairs aligned 0 times concordantly or discordantly; of these:
2662 mates make up the pairs; of these:
1708 (64.16%) aligned 0 times
462 (17.36%) aligned exactly 1 time
492 (18.48%) aligned >1 times


And the insert distribution looks like this (below is log transformed):

However, I accidentally re-run the mapping after cropping the first three bases of the reads using HEADCROP:3 parameter from Trimmomatic. Afterwards the insert size distribution looked much better (still I see a gap at 100 bp):

The mapping efficiency looks very similar, so this is why I guess the mapping was not the problem:

  49940 (100.00%) were paired; of these:
1365 (2.73%) aligned concordantly 0 times
24373 (48.80%) aligned concordantly exactly 1 time
24202 (48.46%) aligned concordantly >1 times
----
1365 pairs aligned concordantly 0 times; of these:
49 (3.59%) aligned discordantly 1 time
----
1316 pairs aligned 0 times concordantly or discordantly; of these:
2632 mates make up the pairs; of these:
1703 (64.70%) aligned 0 times
450 (17.10%) aligned exactly 1 time
479 (18.20%) aligned >1 times


Still I'm a bit confused why this happens or what I'm doing wrong. I don't think is a matter of the quality of the first bases since it looks good, and the mapping efficiency looks similar regardless of removing or not these three bases.

0
Entering edit mode
10 months ago

Just as a follow up to this question (in case that someone ever has the same issue), I realized that the problem was the version of Bowtie2 I was using, which calculates a negative TLEN when paired end reads exactly coincide. This bug was fixed in the latest versions of Bowtie.

0
Entering edit mode

How old was the version you've been using? I've been getting the correct TLEN with Bowtie2 for at least 2 years.

0
Entering edit mode

A bit old; we had version 2.3.4.3 (September 17, 2018). This issue was fixed on the following release (version 2.3.5 - March 16, 2019):

Fixed issue whereby both ends of a paired-end read could have negative TLEN if they exactly coincided

I never thought this could be the issue

0
Entering edit mode

Odd. I've used 2.3.3 before and I was definitely getting proper TLENs for short fragments.