Why is the output of Artificial FASTQ Generator not aligning properly with Bowtie 2?
1
2
Entering edit mode
9.8 years ago
John Smith ▴ 320

I am currently working with a program that generates aritificial FASTQ files when given a reference genome called Artificial FASTQ Generator. Here is a link to the description of the program and here is the manual, it says that the program generates paired-end reads (and it does generate two FASTQ files). After aligning the artificial reads (the two FASTQ files generated by Artificial FASTQ Generator) to the reference genome as paired-end reads using Bowtie 2 I got the following result:

  892589 (100.00%) were paired; of these:
    892585 (100.00%) aligned concordantly 0 times
    4 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    892585 pairs aligned concordantly 0 times; of these:
      870486 (97.52%) aligned discordantly 1 time
    ----
    22099 pairs aligned 0 times concordantly or discordantly; of these:
      44198 mates make up the pairs; of these:
        7154 (16.19%) aligned 0 times
        15716 (35.56%) aligned exactly 1 time
        21328 (48.26%) aligned >1 times
99.60% overall alignment rate

After aligning both FASTQ files as single end reads, I got the following:

  1785178 (100.00%) were unpaired; of these:
    7154 (0.40%) aligned 0 times
    1755694 (98.35%) aligned exactly 1 time
    22330 (1.25%) aligned >1 times
99.60% overall alignment rate

What I do not understand is why these reads are aligning as single-end reads and not as paired-end reads as expected? Is anybody familiar with both programs that can help explain this?

EDIT:

Here are the outputs:

EDIT 2:

Here is the output of the paired-end alignment.

SAM output of Bowtie 2 (paired-end alignment)

bowtie2 RNA-Seq Aritificial-FastQ-Generator fastq • 3.5k views
ADD COMMENT
1
Entering edit mode

I think the discordance issue is probably either because the reads in the read pair don't have --fr orientation or the two reads in a pair have been generated from two different contigs.

ADD REPLY
0
Entering edit mode

Just post the alignments of two mates and we can tell you what's wrong.

ADD REPLY
5
Entering edit mode
9.8 years ago
Michael 54k

Maybe you should post an example of the output, a few read pairs should be sufficient.

There could be many reasons, e.g.:

  • Bug or wrong use of the simulator: mate pairs not on complementary strands
  • too large insert size
  • Bug: annotated pairs do not pair
  • too low simulated quality scores (single-end do align though, so not in this case)
  • too high simulated error rate (dto.)

Maybe you should use a simulator that is better tested, see:

What Ngs Read Simulators Are Available For Paired-End Data?

ADD COMMENT
0
Entering edit mode

I added the outputs.

ADD REPLY
2
Entering edit mode

Even though you aligned the pairs individually, you can see that they would align:

---> read1    ---> read2
------------------ reference

This wouldn't happen in any kit, so that's why the alignments are discordant.

ADD REPLY
0
Entering edit mode

In the SAM file you uploaded, you didn't align the reads as pairs, but as unpaired.

ADD REPLY

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6