I am currently working with a program that generates aritificial FASTQ files when given a reference genome called Artificial FASTQ Generator. Here is a link to the description of the program and here is the manual, it says that the program generates paired-end reads (and it does generate two FASTQ files). After aligning the artificial reads (the two FASTQ files generated by Artificial FASTQ Generator) to the reference genome as paired-end reads using Bowtie 2 I got the following result:
892589 (100.00%) were paired; of these:
892585 (100.00%) aligned concordantly 0 times
4 (0.00%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
892585 pairs aligned concordantly 0 times; of these:
870486 (97.52%) aligned discordantly 1 time
----
22099 pairs aligned 0 times concordantly or discordantly; of these:
44198 mates make up the pairs; of these:
7154 (16.19%) aligned 0 times
15716 (35.56%) aligned exactly 1 time
21328 (48.26%) aligned >1 times
99.60% overall alignment rate
After aligning both FASTQ files as single end reads, I got the following:
1785178 (100.00%) were unpaired; of these:
7154 (0.40%) aligned 0 times
1755694 (98.35%) aligned exactly 1 time
22330 (1.25%) aligned >1 times
99.60% overall alignment rate
What I do not understand is why these reads are aligning as single-end reads and not as paired-end reads as expected? Is anybody familiar with both programs that can help explain this?
EDIT:
Here are the outputs:
- 1st FASTQ output of AFG
- 2nd FASTQ output of AFG
- SAM output of Bowtie 2 (single-end alignment)
EDIT 2:
Here is the output of the paired-end alignment.
SAM output of Bowtie 2 (paired-end alignment)
I think the discordance issue is probably either because the reads in the read pair don't have
--fr
orientation or the two reads in a pair have been generated from two different contigs.Just post the alignments of two mates and we can tell you what's wrong.