Python script to find error rate
0
0
Entering edit mode
7.0 years ago
zaidnab • 0

I am trying to write a python script that, from a BAM files, calculates the error rate of the DNA sequencing based on the reference genome. I am brand new to bioinformatics, and I am very stuck. This is what I have so far:

import pysam

samfile = pysam.AlignmentFile("TruQ3_229.sorted.bam", "rb")

for pileupcolumn in samfile.pileup("chr1", 100, 120)

I have no idea what to do and where to continue. Any help is appreciated. Thanks!

genome sequence alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

What is your definition of error rate? Have you first tried summarizing what do you want to achieve? While posting a question, please try to provide as much as information as possible.

ADD REPLY
0
Entering edit mode

Hi zaidnab,

While not a full answer, you can find some ideas in my recent blog post: Getting the edit distance from a bam alignment: a journey. Let me know if you need further help.

I see you started to make pileups, but I believe you want a per-read error rate?

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6