Bowtie not aligning shorter fragments from ATAC-seq
1
0
Entering edit mode
3.9 years ago

Hi! I'm trying to do the alignment for an ATAC-seq experiment using bowtie2. We did paired-end 100 pb sequencing. By checking with FastQC, of course I have adapter contamination. So I started trimming the reads as follows:

TrimmomaticPE $(DATAPATH)$(FILE1) $(DATAPATH)$(FILE2) $(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output1_unpaired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_unpaired_trimmed.fastq ILLUMINACLIP:/home/ecorrales/Programs/Trimmomatic-0.36/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE TRAILING:10 MINLEN:25

Afterwards I don't see any longer adapter contamination, and only few (0.11%) of the reads are removed. Then I proceeded to do the mapping with bowtie2:

bowtie2 -X 2000 --very-sensitive -x $(GENOMEDIR) -1 $(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq -2 $(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq -S $(SCRATCH_JOBDIR)/$(NAME)_output.sam

I then checked the insert size distribution and there were no inserts smaller than 100 bp. I assumed that this could be related to those fragments where the insert size was smaller than the read length. So I tried trimming the reads to different lengths before mapping. Indeed all the inserts smaller than the length that I trimmed to were not mapped.

In the plot below I show the insert size distribution after trimming the reads to 50 bp or 75 bp (or 100 bp for no trimming):

Insert size distribution

I thought that adding the --dovetail parameter to the mapping would do the trick, but nothing changed. Is there any other parameter that I could be missing?

alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

How are you calculating fragment size? You also may need to add --soft-clipped-unmapped-tlen in addition to --dovetail.

ADD REPLY
0
Entering edit mode

Thank you for your reply. I tried with --soft-clipped-unmapped-tlen but doesn't seem to change. For calculating the fragment size I used

samtools view AS-435485-LR-48138_output.sort.bam | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[   ]*//' > AS-435485-LR-48138_insertSize.txt
ADD REPLY
0
Entering edit mode

Have you checked what happens to the reads with small insert sizes when you don't trim?

ADD REPLY
0
Entering edit mode

I guess the problem might be indeed be in the insert size calculation rather than on the mapping. I initially thought that the small inserts were somehow considered discordant, although I get the following metrics:

  49753 (100.00%) were paired; of these:
1384 (2.78%) aligned concordantly 0 times
24418 (49.08%) aligned concordantly exactly 1 time
23951 (48.14%) aligned concordantly >1 times
----
1384 pairs aligned concordantly 0 times; of these:
  53 (3.83%) aligned discordantly 1 time
----
1331 pairs aligned 0 times concordantly or discordantly; of these:
  2662 mates make up the pairs; of these:
    1708 (64.16%) aligned 0 times
    462 (17.36%) aligned exactly 1 time
    492 (18.48%) aligned >1 times

And the insert distribution looks like this (below is log transformed):

Insert-size

However, I accidentally re-run the mapping after cropping the first three bases of the reads using HEADCROP:3 parameter from Trimmomatic. Afterwards the insert size distribution looked much better (still I see a gap at 100 bp):

Insert-size_Headcrop:3

The mapping efficiency looks very similar, so this is why I guess the mapping was not the problem:

  49940 (100.00%) were paired; of these:
1365 (2.73%) aligned concordantly 0 times
24373 (48.80%) aligned concordantly exactly 1 time
24202 (48.46%) aligned concordantly >1 times
----
1365 pairs aligned concordantly 0 times; of these:
  49 (3.59%) aligned discordantly 1 time
----
1316 pairs aligned 0 times concordantly or discordantly; of these:
  2632 mates make up the pairs; of these:
    1703 (64.70%) aligned 0 times
    450 (17.10%) aligned exactly 1 time
    479 (18.20%) aligned >1 times

Still I'm a bit confused why this happens or what I'm doing wrong. I don't think is a matter of the quality of the first bases since it looks good, and the mapping efficiency looks similar regardless of removing or not these three bases.

ADD REPLY
0
Entering edit mode
3.9 years ago

Just as a follow up to this question (in case that someone ever has the same issue), I realized that the problem was the version of Bowtie2 I was using, which calculates a negative TLEN when paired end reads exactly coincide. This bug was fixed in the latest versions of Bowtie.

ADD COMMENT
0
Entering edit mode

How old was the version you've been using? I've been getting the correct TLEN with Bowtie2 for at least 2 years.

ADD REPLY
0
Entering edit mode

A bit old; we had version 2.3.4.3 (September 17, 2018). This issue was fixed on the following release (version 2.3.5 - March 16, 2019):

Fixed issue whereby both ends of a paired-end read could have negative TLEN if they exactly coincided

I never thought this could be the issue

ADD REPLY
0
Entering edit mode

Odd. I've used 2.3.3 before and I was definitely getting proper TLENs for short fragments.

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6