Nucleotide codes - gap vs low quality basecalls
10 months ago


I'm currently looking at this contig (NZ_BIFH01000018). It's got a stretch of 358 'n' bases (index 273,994 to 274,351). As I understand this can mean (i) 'can be any base' or (ii) 'a sequence gap of undetermined size'. Based on this (and the fact that contigs are essentially defined by their lack of gaps) I think that the N's represent either low quality base-calls or an inferred gap size that should be accurate (as there's more than 100 Ns) but can't see any linkage evidence or base quality data, and don't know how strictly the GenBank standards are maintained.

Can anyone confirm that the number of bases should be around 358 (i.e. the length of the stretch should be roughly accurate)?

Cheers! Tim

