I wish to "fix" the 'N' (unassigned bases) from a ABI file (Sanger sequencing), and parsing it I retrieved the peaks per base per positions the thing is the sequence length is 554bp and the list containing peaks is longer (around 15000 items). Someone explained me that it is normal since this list contains positions at each time point measured. Therefore I would like to know how the base caller decide which position will be used to determine the sequence. On the other hand I also retrieved a list of the highest peaks, whose length match my sequence length BUT I can't tell which base matches each highest peak.
I tried to get a list of local maximum from the list containing all the peaks (base channel), and still the length of the list doesn't match the sequence length. How does the base caller choose the positions?
Anybody's help?
Prompting people won't get you an insta-answer, this is a forum, not IRC. To be honest I found your post a little hard to read and work out exactly what you're asking, which is perhaps why it is not answered. It would help if you were able to share the ab1 file. Have you visualised the file at all? What 'parser' did you use for the ab1 files to generate the sequence? N values in sequence data are generally not 'fixable' unless there's a very clear heterozygous position (at least not in my somewhat dated experience with this kind of data)
I have used abifpy (python) module to parse the ab1 file and yes I have visualized the entire file. Thx for your answer