To know % of reads matching rRNA genes in my sample
2
0
Entering edit mode
8.6 years ago

Dear All,

I want to know % of reads matching to rRNA genes. So I have been suggested to download rRNA genes from ensembl. Then I have created the index files for rRNA fasta file. Finally using bowtie2, I have aligned my reads in fastq format against these rRNA index files. Below is the bowtie2 output I received after running the following command.

Frank$ bowtie2 -N 0 -L 15 -x rRNA_genes -1 Project/Sample/DH558-1_GTGGCC_L005_R1.all.fastq.gz -2 Project/Sample/DH558-1_GTGGCC_L005_R2.all.fastq.gz -S Project/DH558-1.sam

66113117 reads; of these:
  66113117 (100.00%) were paired; of these:
    66061268 (99.92%) aligned concordantly 0 times
    58 (0.00%) aligned concordantly exactly 1 time
    51791 (0.08%) aligned concordantly >1 times
    ----
    66061268 pairs aligned concordantly 0 times; of these:
      13 (0.00%) aligned discordantly 1 time
    ----
    66061255 pairs aligned 0 times concordantly or discordantly; of these:
      132122510 mates make up the pairs; of these:
        132068318 (99.96%) aligned 0 times
        13629 (0.01%) aligned exactly 1 time
        40563 (0.03%) aligned >1 times
0.12% overall alignment rate

What does it mean - aligned concordantly 0,1, >1 times?

0.12% overall alignment rate - Does this 0.12% refer to percentage of reads mapping to rRNA genes?

rRNA-genes bowtie2 RNA-seq • 3.0k views
ADD COMMENT
0
Entering edit mode
8.6 years ago

Yes, 0.12% is the percent that aligned to rRNA.

For your other question just search the site for "align concordant discordant" and you'll get plenty of posts talking about what that means.

ADD COMMENT
0
Entering edit mode

Thanks Devon for confirming it. I believe, the same approach can be applied to know percentage of reads associated to any X gene. Instead of rRNA gene fasta sequence, I need to take fasta sequence of my gene of interest, first to build index and then align to reads.

ADD REPLY
0
Entering edit mode

It would be better to align against the whole genome and then count alignments to your gene. By restricting what you align to you end up increasing the false-positive alignment rate (though the increase may be insignificant in many cases).

ADD REPLY
0
Entering edit mode

Thanks Devon, yeah that need to be considered.

ADD REPLY
0
Entering edit mode
8.6 years ago
michael.ante ★ 3.8k

Hi Frank,

I would additionally map against the NCBI annotation; I see sometimes a bit more there.
And don't forget about the mitochondrial rRNA; it is separately annotated in ENSEMBL and it has a short polyA-tail.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thanks Michael. Do you mean (NCBI annotation) comparing with GTF file?

ADD REPLY
0
Entering edit mode

I mean to download the twenty-something rRNA sequences from Nuccore (don't forget the two mt-rRNAs) as FASTA files and repeat the mapping you've done.

Regarding Devon's remark, please note that reads from rRNAs are very often multi-mapping reads (they are one class in the repeat-masker model) and are under-represented while analysing the genome-mapped reads. On the one hand, they are discarded by the aligner due to too many mappings or they are not counted in e.g. htseq-count.

ADD REPLY

Login before adding your answer.

Traffic: 3694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6