Question: Why alignment is invalid if a start/end coordinate is contained within the other read?
0
gravatar for weishang
5 months ago by
weishang0
China/Shanghai/Shanghai Ocean University
weishang0 wrote:

The manual of Trim Galore says : Trims 1 bp off every read from its 3' end. This may be needed for FastQ files that are to be aligned as paired-end data with Bowtie. This is because Bowtie (1) regards alignments like this:

R1 --------------------------->

R2 <---------------------------

or this:

R1 ----------------------->

R2 <-----------------

as invalid (whenever a start/end coordinate is contained within the other read).

But I still can't understand why alignment is invalid if a start/end coordinate is contained within the other read?

Could some people explain that one to me?

Thanks a lot.

ADD COMMENTlink modified 5 months ago • written 5 months ago by weishang0

I'd say by default you'd expect your paired-end reads to have quite large distance between them. Alignment where two reads are overlapping may mean that these reads come from repetitive region - the real distance between reads is large, but since both reads contain information about the same sequence (the region is repetitive), it maps them close to each other. However, <0 insert distance is not unusual for exome sequencing. I think I never faced this problem using bwa-mem.

ADD REPLYlink written 5 months ago by German.M.Demidov1.5k
2
gravatar for h.mon
5 months ago by
h.mon29k
Brazil
h.mon29k wrote:

The title asks how to understand Trim_Galore --trim1 parameter, but apparently the real question is "why alignment is invalid if a start/end coordinate is contained within the other read?"

I don't know why bowtie considers an alignment invalid if a start/end coordinates is contained within the other read start/end coordinates. It could be a design decision (this would be my guess, probably to avoid funky reads - read more bellow), or it could be some limitation impose by Bowtie algorithm. The fact is bowtie manual clearly states this:

Paired-end alignments where one mate's alignment is entirely contained within the other's are considered invalid.

edit: bowtie source code comments at the file aligner.h hints it is a design decision:

    // Set begin/end to be a range of all reference
    // positions that are legally permitted to be involved in
    // the alignment of the outstanding mate.
    //
    // Note that one of the constraints imposed on which positions
    // go into this range is that the opposite mate cannot be
    // contained entirely within the anchor mate, or vice versa.

and:

        // We can also add a bit more if qlen is less than alen,
        // since we're requiring that opposite not be contained
        // within anchor.
ADD COMMENTlink written 5 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1269 users visited in the last hour