diagnosing contamination in RNA-seq data
2
0
Entering edit mode
2.2 years ago
wiscoyogi ▴ 40

I have some bulk RNA-seq data on human samples that have clearly been contaminated with some non-human sources, I'm observing dismal alignment to the human genome even after generously adjusting thresholds.

I want to diagnose the source of contamination at a broad level (e.g. where are these reads coming from?)

Originally I was going to BLAST except it takes way too long and is overkill on my question (I'm wondering what non-human sources are there, not the genes).

Are there any basic packages that people know of that offer species/genuses of RNA-seq data that I can bake into my existing QC pipeline?

thanks!

sequencing ngs bulk nonhuman • 1.3k views
ADD COMMENT
0
Entering edit mode

use fastqscreen. It will screen for contamination for model organisms, human, mouse, rat and vectors by default. If you can guess the contaminant source organism, index the genome, place indices it in appropriate location, edit the config and fastqscreen will screen fastq against those genomes too.

ADD REPLY
0
Entering edit mode

BLAST is definitely not the best tool for the job. There are a few alternatives covered here: Faster BLAST alternative

ADD REPLY
0
Entering edit mode

BLAST is more sensitive than all these alternatives.

ADD REPLY
0
Entering edit mode

Yes, but you can't realistically BLAST thousands or millions of reads. The goal here is to "diagnose the source of contamination at a broad level".

ADD REPLY
1
Entering edit mode
2.2 years ago
supertech ▴ 180

I would suggest you use something based on k-mers. Maybe this article gives an idea.

ADD COMMENT
1
Entering edit mode
2.2 years ago
shelkmike ★ 1.2k

To diagnose contamination I align random 100 or 1000 reads to NCBI nt (downloaded from https://ftp.ncbi.nlm.nih.gov/blast/db/) by Blastn and look at taxons of best matches.

ADD COMMENT

Login before adding your answer.

Traffic: 1816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6