I tested NovaSeq data. It can actually be very accurate! In my tests, R1 had an 86% rate of matching the reference perfectly, which is great! Unfortunately, R2 had a 0% rate of matching the reference, presumably because of a lighting failure. I expect that in the future this will be fixed.
That said, NovaSeq has a huge problem with index misassignment, so never, ever use NovaSeq when you care about cross-contamination. Ever - for now. My analysis indicated that ~1.5% of single-indexed reads were contaminants (I was surprised; I was expecting 10%, but it's really not that bad), and ~0.08% of dual-indexed reads were contaminants (but note that these dual-indexed reads did not have entirely unique index pairs; it was the usual 8x12=96 arrangement). Contaminants, in this context, means reads that mapped to the wrong genome. With a pure sample the rate is 0%. I think that in the future, using unique pairs of indexes for libraries (IDT and Kapa are working on this) the problem might be solved, but since I have not tested it, I can't say for sure.
Also, the quality scores of NovaSeq reads are worthless. My analysis indicated that there was an average of 20 points on the Phred scale of discrepancy between the stated and actual quality of reads. If every read had completely random quality scores, and the accuracy was also completely random (on the Phred scale of 0-41), you could achieve, on average, being correct 1/42 of the time, and an average discrepancy of 13.83. So you could generate far more accurate quality scores by throwing darts while blindfolded, than by using Illumina's NovaSeq quality-score generation software.
I'm not saying that a blind monkey is better than Illumina's quality-score algorithm. Illumina is a big company, and they care about their users, so obviously, a huge amount of work has gone into base-calling and quality-assignment. But, personally, I'd prefer the blind monkey.