Question

Why alignment is invalid if a start/end coordinate is contained within the other read?

0

Entering edit mode

4.5 years ago

weishang ▴ 20

The manual of Trim Galore says : Trims 1 bp off every read from its 3' end. This may be needed for FastQ files that are to be aligned as paired-end data with Bowtie. This is because Bowtie (1) regards alignments like this:

R1 --------------------------->

R2 <---------------------------

or this:

R1 ----------------------->

R2 <-----------------

as invalid (whenever a start/end coordinate is contained within the other read).

But I still can't understand why alignment is invalid if a start/end coordinate is contained within the other read?

Could some people explain that one to me?

Thanks a lot.

methylation trim galore bowtie alignment • 1.1k views

ADD COMMENT • link 4.5 years ago by weishang ▴ 20

0

Entering edit mode

I'd say by default you'd expect your paired-end reads to have quite large distance between them. Alignment where two reads are overlapping may mean that these reads come from repetitive region - the real distance between reads is large, but since both reads contain information about the same sequence (the region is repetitive), it maps them close to each other. However, <0 insert distance is not unusual for exome sequencing. I think I never faced this problem using bwa-mem.

ADD REPLY • link 4.5 years ago by German.M.Demidov ★ 2.9k

score 1 · Answer 1 · 2019-10-22

The title asks how to understand Trim_Galore --trim1 parameter, but apparently the real question is "why alignment is invalid if a start/end coordinate is contained within the other read?"

I don't know why bowtie considers an alignment invalid if a start/end coordinates is contained within the other read start/end coordinates. It could be a design decision (this would be my guess, probably to avoid funky reads - read more bellow), or it could be some limitation impose by Bowtie algorithm. The fact is bowtie manual clearly states this:

Paired-end alignments where one mate's alignment is entirely contained within the other's are considered invalid.

edit: bowtie source code comments at the file aligner.h hints it is a design decision:

    // Set begin/end to be a range of all reference
    // positions that are legally permitted to be involved in
    // the alignment of the outstanding mate.
    //
    // Note that one of the constraints imposed on which positions
    // go into this range is that the opposite mate cannot be
    // contained entirely within the anchor mate, or vice versa.

and:

        // We can also add a bit more if qlen is less than alen,
        // since we're requiring that opposite not be contained
        // within anchor.