Identify and remove contaminations in RNA-Seq dataset
0
1
Entering edit mode
12 weeks ago
Abdul ▴ 10

Hi,

I am working with RNA-Seq Illumina PE – 150 bp dataset. I was wondering if there is a way to identify contaminations (such as mitochondrial DNA contamination, any other types of contamination) in the data, and remove the same?

Can this be removed pre or post alignment or maybe filter by reads?

Best Regards,

Abdul

rsem fastqc quantification alignment fastp • 618 views
1
Entering edit mode

They would be removed post alignment, because you can't tell what a sequence represents until you align it to something. However, without knowing your specific purpose, or what you're trying to achieve, it's hard to recommend a specific strategy. You can filter your BAM files to remove certain alignment targets (e.g. Mitochondrial DNA), or you can generate counts on features and remove the features you want to ignore (i.e. rows in your count table representing mitochondrial genes, etc.). All come with caveats for your analysis.

0
Entering edit mode

seidel Thank you for the feedback. I am working on the gene counts file filtered to include the only protein coding genes and lncRNAs.

I was quickly going through the script and assume that FASTQ files were assessed using FastQC > aligned using bowtie2 with inclusion of chrM in the reference > filtered + trimmed using fastp > quantified using rsem to obtain gene counts > filtered to include protein coding genes and lncRNAs.

0
Entering edit mode

1
Entering edit mode

You could use bbsplit.sh to bin the reads so that the contaminating reads can be separated. See: Extracting contaminated reads from the sequenced data

0
Entering edit mode

GenoMax
Thank you for the inputs.

Traffic: 1961 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.