Question

Why is the output of Artificial FASTQ Generator not aligning properly with Bowtie 2?

2

Entering edit mode

9.8 years ago

John Smith ▴ 320

I am currently working with a program that generates aritificial FASTQ files when given a reference genome called Artificial FASTQ Generator. Here is a link to the description of the program and here is the manual, it says that the program generates paired-end reads (and it does generate two FASTQ files). After aligning the artificial reads (the two FASTQ files generated by Artificial FASTQ Generator) to the reference genome as paired-end reads using Bowtie 2 I got the following result:

  892589 (100.00%) were paired; of these:
    892585 (100.00%) aligned concordantly 0 times
    4 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    892585 pairs aligned concordantly 0 times; of these:
      870486 (97.52%) aligned discordantly 1 time
    ----
    22099 pairs aligned 0 times concordantly or discordantly; of these:
      44198 mates make up the pairs; of these:
        7154 (16.19%) aligned 0 times
        15716 (35.56%) aligned exactly 1 time
        21328 (48.26%) aligned >1 times
99.60% overall alignment rate

After aligning both FASTQ files as single end reads, I got the following:

  1785178 (100.00%) were unpaired; of these:
    7154 (0.40%) aligned 0 times
    1755694 (98.35%) aligned exactly 1 time
    22330 (1.25%) aligned >1 times
99.60% overall alignment rate

What I do not understand is why these reads are aligning as single-end reads and not as paired-end reads as expected? Is anybody familiar with both programs that can help explain this?

EDIT:

Here are the outputs:

1st FASTQ output of AFG
2nd FASTQ output of AFG
SAM output of Bowtie 2 (single-end alignment)

EDIT 2:

Here is the output of the paired-end alignment.

SAM output of Bowtie 2 (paired-end alignment)

bowtie2 RNA-Seq Aritificial-FastQ-Generator fastq • 3.5k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by John Smith ▴ 320

1

Entering edit mode

I think the discordance issue is probably either because the reads in the read pair don't have --fr orientation or the two reads in a pair have been generated from two different contigs.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Just post the alignments of two mates and we can tell you what's wrong.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

Ram · Accepted Answer · 2014-06-18

5

Entering edit mode

9.8 years ago

Michael 54k

Maybe you should post an example of the output, a few read pairs should be sufficient.

There could be many reasons, e.g.:

Bug or wrong use of the simulator: mate pairs not on complementary strands
too large insert size
Bug: annotated pairs do not pair
too low simulated quality scores (single-end do align though, so not in this case)
too high simulated error rate (dto.)

Maybe you should use a simulator that is better tested, see:

What Ngs Read Simulators Are Available For Paired-End Data?

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Michael 54k

0

Entering edit mode

I added the outputs.

ADD REPLY • link 9.8 years ago by John Smith ▴ 320

2

Entering edit mode

Even though you aligned the pairs individually, you can see that they would align:

---> read1    ---> read2
------------------ reference

This wouldn't happen in any kit, so that's why the alignments are discordant.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

In the SAM file you uploaded, you didn't align the reads as pairs, but as unpaired.

ADD REPLY • link 9.8 years ago by Devon Ryan 104k