Why alignment is invalid if a start/end coordinate is contained within the other read?
1
0
Entering edit mode
4.5 years ago
weishang ▴ 20

The manual of Trim Galore says : Trims 1 bp off every read from its 3' end. This may be needed for FastQ files that are to be aligned as paired-end data with Bowtie. This is because Bowtie (1) regards alignments like this:

R1 --------------------------->

R2 <---------------------------

or this:

R1 ----------------------->

R2 <-----------------

as invalid (whenever a start/end coordinate is contained within the other read).

But I still can't understand why alignment is invalid if a start/end coordinate is contained within the other read?

Could some people explain that one to me?

Thanks a lot.

methylation trim galore bowtie alignment • 1.1k views
ADD COMMENT
0
Entering edit mode

I'd say by default you'd expect your paired-end reads to have quite large distance between them. Alignment where two reads are overlapping may mean that these reads come from repetitive region - the real distance between reads is large, but since both reads contain information about the same sequence (the region is repetitive), it maps them close to each other. However, <0 insert distance is not unusual for exome sequencing. I think I never faced this problem using bwa-mem.

ADD REPLY
1
Entering edit mode
4.5 years ago
h.mon 35k

The title asks how to understand Trim_Galore --trim1 parameter, but apparently the real question is "why alignment is invalid if a start/end coordinate is contained within the other read?"

I don't know why bowtie considers an alignment invalid if a start/end coordinates is contained within the other read start/end coordinates. It could be a design decision (this would be my guess, probably to avoid funky reads - read more bellow), or it could be some limitation impose by Bowtie algorithm. The fact is bowtie manual clearly states this:

Paired-end alignments where one mate's alignment is entirely contained within the other's are considered invalid.

edit: bowtie source code comments at the file aligner.h hints it is a design decision:

    // Set begin/end to be a range of all reference
    // positions that are legally permitted to be involved in
    // the alignment of the outstanding mate.
    //
    // Note that one of the constraints imposed on which positions
    // go into this range is that the opposite mate cannot be
    // contained entirely within the anchor mate, or vice versa.

and:

        // We can also add a bit more if qlen is less than alen,
        // since we're requiring that opposite not be contained
        // within anchor.
ADD COMMENT

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6