Question

Contaminating sequences from Novogene RNAseq results

0

Entering edit mode

5.5 years ago

n.bailey2 • 0

Hi,

I wondered if anyone had a similar experience with results from Novogene commercial RNAseq.

My experiment was dual RNAseq of 2 species In Vitro

After aligning the reads to my reference genomes I found a large proportion didn't map (40-50%).

Looking at the data this was largely explained by strain differences between the reference genomes and the data/issues with the genome quality.

However:

I classified the reads taxonomically using kraken2 and found that there were significant numbers of reads from unexpected organisms. Some could be potentially explained by contamination growth in the culture (bacteria etc.), but across the board there was large numbers of reads from model organisms (consistently around 1.3% human reads (maybe real contamination from experimenter, but for so many reads?), with significant numbers (100 k +) of reads for things like zebrafish, C. elegans etc.)
Across the board there was very large numbers of reads that kraken2 couldn't classify. I also tried an assembly and couldn't get any results from BLAST against nr for a lot of these.

Is Kraken mis-assigning the reads? Could this represent poor practice on their part, with manipulating our samples in parallel with those of other customers? Could it be a particularly bad case of index hopping in the sequencer, if they multiplexed us with samples from other customers?

I fear if this is the case then it may not be possible to tell whether the large-ish groups of bacterial reads are down to contamination of the cultures or during library prep+sequencing. Sadly I got these samples from a collaborator's lab so difficult for me to say whether contamination is a likely possibility.

Has anyone had a similar experience with Novogene?

Any help much appreciated

N

RNA-Seq next-gen sequencing novogene commercial • 1.5k views

ADD COMMENT • link updated 5.5 years ago by GenoMax 152k • written 5.5 years ago by n.bailey2 • 0

0

Entering edit mode

try fastqscreen with model organism indices...

ADD REPLY • link 5.5 years ago by cpad0112 21k

score 0 · Answer 1 · 2020-01-02

Is Kraken mis-assigning the reads?

That is a possibility. Your results are only as good as the database used for classification of the reads.

Could this represent poor practice on their part, with manipulating our samples in parallel with those of other customers?

Again a possibility. That said, Novogene is a large commercial sequencing organization (and they have been around for sometime) so they must have checks and balances in place to prevent this sort of thing from happening.

Could it be a particularly bad case of index hopping in the sequencer, if they multiplexed us with samples from other customers?

If your data uses dual indexes (which it probably does since Novogene mostly does NovaSeq sequencing) then the chances of index hopping is significantly reduced.

You don't say what kind of organisms were involved in this experiment (and how close they are to each other taxonomically). You should definitely test for mycoplasma contamination if these are cell cultures. People are sometimes surprised by the volume of reads that can be attributed to mycoplasma.