I wondered if anyone had a similar experience with results from Novogene commercial RNAseq.
My experiment was dual RNAseq of 2 species In Vitro
After aligning the reads to my reference genomes I found a large proportion didn't map (40-50%).
Looking at the data this was largely explained by strain differences between the reference genomes and the data/issues with the genome quality.
I classified the reads taxonomically using kraken2 and found that there were significant numbers of reads from unexpected organisms. Some could be potentially explained by contamination growth in the culture (bacteria etc.), but across the board there was large numbers of reads from model organisms (consistently around 1.3% human reads (maybe real contamination from experimenter, but for so many reads?), with significant numbers (100 k +) of reads for things like zebrafish, C. elegans etc.)
Across the board there was very large numbers of reads that kraken2 couldn't classify. I also tried an assembly and couldn't get any results from BLAST against nr for a lot of these.
Is Kraken mis-assigning the reads? Could this represent poor practice on their part, with manipulating our samples in parallel with those of other customers? Could it be a particularly bad case of index hopping in the sequencer, if they multiplexed us with samples from other customers?
I fear if this is the case then it may not be possible to tell whether the large-ish groups of bacterial reads are down to contamination of the cultures or during library prep+sequencing. Sadly I got these samples from a collaborator's lab so difficult for me to say whether contamination is a likely possibility.
Has anyone had a similar experience with Novogene?
Any help much appreciated