Question: unmapped reads in paired-end run pf bowtie2
0
21 months ago by
Assa Yeroslaviz1.3k
Munich
Assa Yeroslaviz1.3k wrote:

I have a data set of paired-end samples which I'm mapping with bowtie2.

I am mainly focus on the unmapped reads, as these are the reads we're interested in.

This is a sample output of the mapping results from on of my runs:

``````11216394 reads; of these:
11216394 (100.00%) were paired; of these:
8079466 (72.03%) aligned concordantly 0 times
2987422 (26.63%) aligned concordantly exactly 1 time
149506 (1.33%) aligned concordantly >1 times
----
8079466 pairs aligned concordantly 0 times; of these:
417642 (5.17%) aligned discordantly 1 time
----
7661824 pairs aligned 0 times concordantly or discordantly; of these:
15323648 mates make up the pairs; of these:
15226093 (99.36%) aligned 0 times
63924 (0.42%) aligned exactly 1 time
33631 (0.22%) aligned >1 times
32.13% overall alignment rate
``````

Out of this output, how many unmapped reads do I have?

From my understanding I was thinking it is the fourth row from the bottom = 15226093?

If I calculate all three options in this summary i have aligned once - (29874222) + (417642 *2) + 63924 = 6874052 aligned >1 - (1495062) + 33631 = 332643 unaligned - 15226093 sum = 6874052 + 332643 + 15226093 = 22432788

This last number is the sum of my input reads of this sample.

But When I look at the number of reads in the `unmapped.fastq` file after the mapping it corresponds to the third row of the file (x2) = 8079466 x2 = 16158932?

But than adding the mapped and unmapped reads together give me more reads, than the there were in total.

Can anyone help me understand what is happening here?

How do I calculate the number of unmapped reads in a paired-end mapped sample?

thanks Assa

modified 21 months ago by Devon Ryan94k • written 21 months ago by Assa Yeroslaviz1.3k

What was the exact command you used? There are a few options for outputting various subsets of unmapped reads, it's likely you used the wrong one.

This is the commands i use

``````bowtie2 --sensitive --end-to-end -p 10 --un-conc-gz firstRun.Unmapped.gz --phred33 -x genome -1 1.fq.gz -2 2.fq.gz | samtools view -Sbhu -F 4 - | samtools sort - -o sorted.bam
samtools index sorted.bam
``````

The file I need in the next step is `firstRun.Unmapped.gz`.

I didn't subset the bam file with samtools, I thought the unmapped reads are outputted to this file.

Just to add - this is the output of the same file when running the `samtools flagstat` command

``````7206695 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
7206695 + 0 mapped (100.00% : N/A)
7206695 + 0 paired in sequencing
6273856 + 0 properly paired (87.06% : N/A)
7131084 + 0 with itself and mate mapped
75611 + 0 singletons (1.05% : N/A)
3574 + 0 with mate mapped to a different chr
719 + 0 with mate mapped to a different chr (mapQ>=5)
``````

I can't see how the numbers of the stat (bowtie2 output) file and the flagstat output correlate.

1
21 months ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

The discrepency you're seeing is due to using `--un-conc-gz`, which is writing the 8079466 pairs that didn't align concordantly to the specified fastq files. You'll instead want `--un-gz` instead.

I thought so too. But if I use the same command but with `--un-gz` and the input files `-1 1.fq.gz -2 2.fq.gz`, the unmapped file is empty.

Is there a way to map the files in a paired-end modus and still get the unmapped reads, not pairs?

Then I'd suggest not writing them to a file with bowtie2 but instead keeping them in the SAM output and making fastq files from that (`samtools fastq -f 4`).