Question: Why is the output of Artificial FASTQ Generator not aligning properly with Bowtie 2?
2
gravatar for John Smith
5.1 years ago by
John Smith260
United States
John Smith260 wrote:

I am currently working with a program that generates aritificial FASTQ files when given a reference genome called Artificial FASTQ Generator. Here is a link to the description of the program and here is the manual, it says that the program generates paired-end reads (and it does generate two FASTQ files). After aligning the artificial reads (the two FASTQ files generated by Artificial FASTQ Generator) to the reference genome as paired-end reads using Bowtie 2 I got the following result:

  892589 (100.00%) were paired; of these:
    892585 (100.00%) aligned concordantly 0 times
    4 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    892585 pairs aligned concordantly 0 times; of these:
      870486 (97.52%) aligned discordantly 1 time
    ----
    22099 pairs aligned 0 times concordantly or discordantly; of these:
      44198 mates make up the pairs; of these:
        7154 (16.19%) aligned 0 times
        15716 (35.56%) aligned exactly 1 time
        21328 (48.26%) aligned >1 times
99.60% overall alignment rate

After aligning both FASTQ files as single end reads, I got the following:

  1785178 (100.00%) were unpaired; of these:
    7154 (0.40%) aligned 0 times
    1755694 (98.35%) aligned exactly 1 time
    22330 (1.25%) aligned >1 times
99.60% overall alignment rate

What I do not understand is why these reads are aligning as single-end reads and not as paired-end reads as expected? Is anybody familiar with both programs that can help explain this?

EDIT:

Here are the outputs:

1st FASTQ output of AFG

2nd FASTQ output of AFG

SAM output of Bowtie 2 (single-end alignment)

EDIT 2:

Here is the output of the paired-end alignment.

SAM output of Bowtie 2 (paired-end alignment)

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by John Smith260
1

I think the discordance issue is probably either because the reads in the read pair don't have --fr orientation or the two reads in a pair have been generated from two different contigs. 

ADD REPLYlink written 5.1 years ago by Ashutosh Pandey11k

Just post the alignments of two mates and we can tell you what's wrong.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k
5
gravatar for Michael Dondrup
5.1 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Maybe you should post an example of the output, a few read pairs should be sufficient.

There could be many reasons, e.g.:

  • Bug or wrong use of the simulator: mate pairs not on complementary strands
  • too large insert size 
  • Bug: annotated pairs do not pair
  • too low simulated quality scores (single-end do align though, so not in this case)
  • too high simulated error rate (dto.)

Maybe you should use a simulator that is better tested, see:

What Ngs Read Simulators Are Available For Paired-End Data?

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Michael Dondrup46k

I added the outputs.

ADD REPLYlink written 5.1 years ago by John Smith260
2

Even though you aligned the pairs individually, you can see that they would align:

---> read1    ---> read2
------------------ reference

This wouldn't happen in any kit, so that's why the alignments are discordant.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k

In the SAM file you uploaded, you didn't align the reads as pairs, but as unpaired.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour