replacing 'N' with 'A', 'C', 'G', or 'T' from ABI file (sanger)
1
0
Entering edit mode
9.2 years ago

I wish to "fix" the 'N' (unassigned bases) from a ABI file (Sanger sequencing), and parsing it I retrieved the peaks per base per positions the thing is the sequence length is 554bp and the list containing peaks is longer (around 15000 items). Someone explained me that it is normal since this list contains positions at each time point measured. Therefore I would like to know how the base caller decide which position will be used to determine the sequence. On the other hand I also retrieved a list of the highest peaks, whose length match my sequence length BUT I can't tell which base matches each highest peak.

sequencing sequence • 2.1k views
ADD COMMENT
0
Entering edit mode

I tried to get a list of local maximum from the list containing all the peaks (base channel), and still the length of the list doesn't match the sequence length. How does the base caller choose the positions?

ADD REPLY
0
Entering edit mode

Anybody's help?

ADD REPLY
0
Entering edit mode

Prompting people won't get you an insta-answer, this is a forum, not IRC. To be honest I found your post a little hard to read and work out exactly what you're asking, which is perhaps why it is not answered. It would help if you were able to share the ab1 file. Have you visualised the file at all? What 'parser' did you use for the ab1 files to generate the sequence? N values in sequence data are generally not 'fixable' unless there's a very clear heterozygous position (at least not in my somewhat dated experience with this kind of data)

ADD REPLY
0
Entering edit mode

I have used abifpy (python) module to parse the ab1 file and yes I have visualized the entire file. Thx for your answer

ADD REPLY
2
Entering edit mode
9.2 years ago
lh3 33k

There are a whole lot of details behind base calling, far more complex than naive peak calling from fluorescent data. Phred, for example, uses FFT to filter noisy peaks and considers peak spacing and the area under a peak. Unless you want to write a base caller for sanger data, which is pretty much pointless nowadays, you should just take the base calls in the ABI file.

ADD COMMENT
0
Entering edit mode

Thank you for your answer Ih3

ADD REPLY
0
Entering edit mode

I should add that you can also consider to use phred to re-call the trace. Phred generates .phd file, which gives you the peak position.

ADD REPLY

Login before adding your answer.

Traffic: 2943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6