Question: Orientation in paired-end sequencing?
6
gravatar for John Smith
4.8 years ago by
John Smith260
United States
John Smith260 wrote:

I am new to bioinformatics and currently learning how to use Bowtie 2. As written in the manual: 

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly". If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren't in the expected relative orientation, or aren't within the expected distance range, or both), the pair is said to align "discordantly". 

I have read about the basics of paired-end sequencing and orientation (in the molecular biology sense). In summary, my understanding is that in paired-end sequencing we sequence both ends of a DNA fragment at the 5' end and the 3' end (we call them mate 1 and mate 2) and by knowing the expected length between the mates we can better align the fragment to a reference genome.

 My question is, what is it meant by two mates having an expected relative orientation? If am using Bowtie 2 and giving it a file with all the first mates and another one with all the corresponding second mates, how can two mates align without first having the expected orientation?

ADD COMMENTlink modified 4.8 years ago by cts1.6k • written 4.8 years ago by John Smith260
9
gravatar for cts
4.8 years ago by
cts1.6k
Pasadena
cts1.6k wrote:

In Illumina paired-end sequencing the two two mates will be sequenced in the opposite direction to one another (see my ASCII art below), so what you expect from this is that the reads should be pointing into each other, which would be the expected orientation. The reads can be in other orientations if:

1. say you were mapping to a reference from a related organism which had an inversion in its genome relative to what you sequenced; 

2. The reads mapped to a palindromic region and the read mapper could not determine the correct orientation

3. There is a problem with the assembly that you are mapping to (for example errors in a de novo assembly)

 

------->                     Read 1

=========================    DNA fragment

               <---------    Read 2

 

Also note that other sequencing technologies produce reads in different orientations. For example, Illumina mate-pair creates reads that point away from each other rather than into each other. I don't know how bowtie handles these cases where other orientations are expected

ADD COMMENTlink written 4.8 years ago by cts1.6k
1

The orientations are determined with regards to their 3' and 5' ends? The mapper than supposes that for the given platform the reads should align in a particular orientation, and if they map differently, states a  warning / error?

Does the mapper (like bwa for instance) than check for all mapping combinations (-><-, ->->, <-<-, <-->) or does it fix the first seq from the pair and rotate the other seq?

ADD REPLYlink written 4.8 years ago by bulovic.ana70
1

Since the sequence in a fastq file is always 5'->3', the orientation is determined by the relative positions of the alignments and which, if either, is reverse complemented. If they don't align with the proper orientation then bit 0x2 in the flag field won't be set.

How the search is performed will vary by aligner. Bowtie2, for example, will search for concordant pairs first (at least if memory serves).

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

To clarify what you mean by reads "pointing" to each other, take the following example:

ACAAGATGCCATTGTCCC      Read 1

ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC DNA fragment

CCCGTCGCCACCGGCACC      Read 2

Are these two reads pointing to each other?

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by John Smith260
1

The second read wouldn't align to the given stretch of DNA; read1 would map as unpaired.

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

How then would two reads be pointing to each other? Would it be like in the following example?

 

ACAAGATGCCATTGTCCC      Read 1

ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC DNA fragment

CCACGGCCACCGCTGCCC   Read 2

I am just having trouble understanding what "pointing to each other" means in this context.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by John Smith260

If the original sequence of read1 was ACAAGATGCCATTGTCCC and that for read2 was GGGCAGCGGTGGCCGTGG, then yes.

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

How did you come up with read 2?
 

ADD REPLYlink written 4.8 years ago by John Smith260

It's the reverse complement of the sequence you posted. I'm guessing that your background isn't biology :)

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

You guessed right! I just have one more question, why the complement and not sequence itself?

ADD REPLYlink written 4.8 years ago by John Smith260
1

DNA is double stranded, with the opposite strand being the complement. So, if one strand is

ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC + strand
TGTTCTACGGTAACAGGGGGCCGGCGGACCGAGAGGCCCCGGTGCCGGTGGCGACGGG - strand

Since sequence is always presented 5' to 3', you need the reverse complement. You'd be well served to take a few biology classes, you're life would be easier.

ADD REPLYlink written 4.8 years ago by Devon Ryan88k

Hi, I have a little doubt about that. I have a dataset of Illumina paired-end strand-specific reads. I assembled it de novo using Trinity, and now I'm assessing its quality using several metrics. One of them is align the initial reads with the new contigs. For that I'm using Bowtie2; but I wonder: Given a forward read and a contig, does Bowtie2 try to align the read in both strands? I always assumed that the answer is yes, but just to be sure.

ADD REPLYlink written 19 months ago by Andrés Ribone0

I don't think that this is the question. Mate-Pair and Paired-End are COMPLETELY different. One (Mate-Pair) is a library construction method. The other (Paired-End) is a method of sequencing. The confusion arises from Pyro/454/Roche terminology wherein they used the term "mate-pair" to describe what Illumina now calls "paired-end" sequencing. The distinction was made (by Illumina) when the technique of constructing "MATE-PAIR" libraries was introduced and when Illumina introduced sequencing from both ends "PAIRED-END".

To further clarify; on Illumina platforms, paired-end sequencing is sequenced as depicted in the illustration. The reads come in TWO files (R1 and R2). The ORIENTATION of the reads is (R1)FORWARD (R2)REVERSE. However, both files will have the reads written 5'----3' orientation.

For example, If you make an interleaved file, the R1 read would be FORWARD and the R2 read would be REVERSE.

ADD REPLYlink written 18 months ago by kissaj90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 838 users visited in the last hour