Identify and remove contaminations in RNA-Seq dataset
0
1
Entering edit mode
16 months ago
Abdul ▴ 10

Hi,

I am working with RNA-Seq Illumina PE – 150 bp dataset. I was wondering if there is a way to identify contaminations (such as mitochondrial DNA contamination, any other types of contamination) in the data, and remove the same?

Can this be removed pre or post alignment or maybe filter by reads?

Best Regards,

Abdul

rsem fastqc quantification alignment fastp • 1.3k views
ADD COMMENT
1
Entering edit mode

They would be removed post alignment, because you can't tell what a sequence represents until you align it to something. However, without knowing your specific purpose, or what you're trying to achieve, it's hard to recommend a specific strategy. You can filter your BAM files to remove certain alignment targets (e.g. Mitochondrial DNA), or you can generate counts on features and remove the features you want to ignore (i.e. rows in your count table representing mitochondrial genes, etc.). All come with caveats for your analysis.

ADD REPLY
0
Entering edit mode

seidel Thank you for the feedback. I am working on the gene counts file filtered to include the only protein coding genes and lncRNAs.

I was quickly going through the script and assume that FASTQ files were assessed using FastQC > aligned using bowtie2 with inclusion of chrM in the reference > filtered + trimmed using fastp > quantified using rsem to obtain gene counts > filtered to include protein coding genes and lncRNAs.

ADD REPLY
0
Entering edit mode

seidel edited my reply.

ADD REPLY
1
Entering edit mode

You could use bbsplit.sh to bin the reads so that the contaminating reads can be separated. See: Extracting contaminated reads from the sequenced data

ADD REPLY
0
Entering edit mode

GenoMax
Thank you for the inputs.

ADD REPLY

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6