Question: How can I count reverse transcriptase (RT) stops and misincorporations?
gravatar for majd.abdulghani
14 months ago by
majd.abdulghani10 wrote:

Hi, I'm fairly new to bioinformatics. I have some RNA-seq data for mapping an RNA modification. With a certain treatment, this modification causes the reverse transcriptase (RT) to stop (theoretically at the nucleotide immediately before the modification) or to misincoporate a nucleotide (in place of the modified base).

I have BAM files for the control (no treatment added to induce stops and mismatches) and treated (which should have more stops and mismatches) RNA samples.

To map the modification, I want to count the number of stops or mismatches at each nucleotide, divide that by the total number of reads (i.e. number of reads where there is a stop or mismatch + number of reads without). I want to do this for both the treatment and the control samples, then subtract the events that occur in the control from the treated sample to normalize for other factors that can cause stops or misincorporations. This is based on a method in this paper.

(Note: just to avoid any confusion, since misincorporations and stops are mutually exclusive events (a RT will not stop and misincorporate a base), and one of them occurs immediately before the modification (stop) while the other occurs in its place (mismatch), I would do the calculations for each event separately.)

The issue is that while I know the theory, I really don't know how to put it in practice and extract that information from the BAM files. I read about the SAM/BAM file formats and I know that they contain the information I need, but not sure how to extract it. I have a Windows 10 OS with Python 3 and R installed. I know a bit of R and have just recently started learning Python 3. My PI would like me to do this in Python, so if you can offer some insight as to how to get started with this in Python 3, I would really appreciate it. But if R is better, that's fine too, I just want to get this done. Thank you.

rna-seq • 347 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by majd.abdulghani10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1058 users visited in the last hour