Question: How to remove rRNA, bacterial RNA and polyA contamination from RNA-seq data(fastq format)?
2
gravatar for Megan
2.5 years ago by
Megan40
Megan40 wrote:

Hi all,

I am trying to do some QC on RNA-seq raw reads. According to FastQC results, there is some rRNA, bacterial RNA and polyA contamination. But here are my problems.

  1. I have no idea how serious the contamination is. How can I tell it from the results of FastQC?
  2. Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?
  3. How can I remove these contamination?

    • How to remove PolyA and bacterial RNA contamination?
    • For rRNA, I have tried the following: (1) download Mt_rRNA, rRNA and Mt_tRNA sequences from BioMart of Ensembl. (2) using bowtie2 for rRNA + tRNA removal.

      step 1: create index
              bowtie2-build rRNA.fasta rRNA.index
      step 2: Align to rRNA index inorder to get rRNA free fastq file. 
              bowtie2 -x rRNA.index -1 sampleA.1.fq -2 sampleA.2.fq --phred33 -N 0 
              --un-conc sampleA-filter.fq --al-conc rRNA.fq -p 8
      

      Is this correct?

Thank you very much!

sequencing rna-seq • 2.4k views
ADD COMMENTlink modified 2.5 years ago by Sumit Paliwal30 • written 2.5 years ago by Megan40

Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?

It depends on your downstream analyses - what do you want to do?

Are you sure you have polyA contamination? What kind of libraries do you have? The most common Illumina RNAseq library is mRNA with polyA capture.

How did FastQC tell you had bacterial contamination? If I am not mistaken, FastQC does not include bacterial contamination by default.

ADD REPLYlink written 2.5 years ago by h.mon28k

Hi,

For downstream analysis, I am going to do DEA and transcriptome reconstruction, etc.

The mRNA is enriched using polyA capture. It is possible to have polyA contamination.

How can I tell if there is contamination? Primarily, I looked at the 'per sequence GC content' and 'overrepresented sequences' sections of FastQC report. And I checked those overrepresented sequences in blast. The overrepresented sequences show polyA and adenovirus contamination.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Megan40
1
gravatar for Carlos Caicedo
2.5 years ago by
Colombia/Universidad de Antioquia
Carlos Caicedo130 wrote:

SortMeRNA could help you. It was developed to filter ribosomal RNA. In addition, it give you an idea of how serious is the contamination because you obtain a percent of the reads aligned to ribosomal RNA.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Carlos Caicedo130

In addition to SortMeRNA, BBDuk can also do this (see this thread how). My impression is SortMeRNA is slightly more precise, but BBDuk is much faster.

ADD REPLYlink written 2.5 years ago by h.mon28k
1
gravatar for Sumit Paliwal
2.5 years ago by
Sumit Paliwal30 wrote:

I don't know if you have done it but you can use fastq_screen to check for cross-species or other (e.g adapters) contaminations. You can also load your aligned SAM/BAM files in SeqMonk for an RNASeq QC report. SeqMonk is GUI based and hence user friendly. Thereafter as suggested above you can probably use SortMeRNA to remove rRNA reads.

ADD COMMENTlink written 2.5 years ago by Sumit Paliwal30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1232 users visited in the last hour