Question: How to remove rRNA, bacterial RNA and polyA contamination from RNA-seq data(fastq format)?
2
gravatar for Megan
22 months ago by
Megan40
Megan40 wrote:

Hi all,

I am trying to do some QC on RNA-seq raw reads. According to FastQC results, there is some rRNA, bacterial RNA and polyA contamination. But here are my problems.

  1. I have no idea how serious the contamination is. How can I tell it from the results of FastQC?
  2. Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?
  3. How can I remove these contamination?

    • How to remove PolyA and bacterial RNA contamination?
    • For rRNA, I have tried the following: (1) download Mt_rRNA, rRNA and Mt_tRNA sequences from BioMart of Ensembl. (2) using bowtie2 for rRNA + tRNA removal.

      step 1: create index
              bowtie2-build rRNA.fasta rRNA.index
      step 2: Align to rRNA index inorder to get rRNA free fastq file. 
              bowtie2 -x rRNA.index -1 sampleA.1.fq -2 sampleA.2.fq --phred33 -N 0 
              --un-conc sampleA-filter.fq --al-conc rRNA.fq -p 8
      

      Is this correct?

Thank you very much!

sequencing rna-seq • 1.9k views
ADD COMMENTlink modified 22 months ago by Sumit Paliwal20 • written 22 months ago by Megan40

Is it necessary to remove contamination? Or is there a cutoff beyond which should I remove the contamination?

It depends on your downstream analyses - what do you want to do?

Are you sure you have polyA contamination? What kind of libraries do you have? The most common Illumina RNAseq library is mRNA with polyA capture.

How did FastQC tell you had bacterial contamination? If I am not mistaken, FastQC does not include bacterial contamination by default.

ADD REPLYlink written 22 months ago by h.mon24k

Hi,

For downstream analysis, I am going to do DEA and transcriptome reconstruction, etc.

The mRNA is enriched using polyA capture. It is possible to have polyA contamination.

How can I tell if there is contamination? Primarily, I looked at the 'per sequence GC content' and 'overrepresented sequences' sections of FastQC report. And I checked those overrepresented sequences in blast. The overrepresented sequences show polyA and adenovirus contamination.

ADD REPLYlink modified 22 months ago • written 22 months ago by Megan40
1
gravatar for Carlos Caicedo
22 months ago by
Colombia/Universidad de Antioquia
Carlos Caicedo130 wrote:

SortMeRNA could help you. It was developed to filter ribosomal RNA. In addition, it give you an idea of how serious is the contamination because you obtain a percent of the reads aligned to ribosomal RNA.

ADD COMMENTlink modified 22 months ago • written 22 months ago by Carlos Caicedo130

In addition to SortMeRNA, BBDuk can also do this (see this thread how). My impression is SortMeRNA is slightly more precise, but BBDuk is much faster.

ADD REPLYlink written 22 months ago by h.mon24k
0
gravatar for Sumit Paliwal
22 months ago by
Sumit Paliwal20 wrote:

I don't know if you have done it but you can use fastq_screen to check for cross-species or other (e.g adapters) contaminations. You can also load your aligned SAM/BAM files in SeqMonk for an RNASeq QC report. SeqMonk is GUI based and hence user friendly. Thereafter as suggested above you can probably use SortMeRNA to remove rRNA reads.

ADD COMMENTlink written 22 months ago by Sumit Paliwal20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1514 users visited in the last hour