Question: Getting No. of reads from BAM by pysam (ChIP-seq)
gravatar for BioDH
3.1 years ago by
BioDH0 wrote:

I'm trying to get no. of reads form (TSS ~ +500) and (-500 to pA site) of each gene.

What I want to get is the no of reads of TSS proximal and pA proximal region(window = 500)

If I have a bed file that contains two genes, one plus strand and the other minus strand,

I was wondering if following script is what I want.

WT_bam=pysam.AlignmentFile("./WT.bam", "rb")
mut_bam=pysam.AlignmentFile("./mut.bam", "rb")

def getReads (chromosome, start, end, strand, window=500):
if (strand=="+"):
    WT_5end_reads=len(list(WT_bam.fetch(chromosome, start, start+window)))
    WT_3end_reads=len(list(WT_bam.fetch(chromosome, end-window, end)))
    mut_5end_reads=len(list(mut_bam.fetch(chromosome, start, start+window)))
    mut_3end_reads=len(list(mut_bam.fetch(chromosome, end-window, end)))
elif (strand=="-"):
    WT_3end_reads=len(list(WT_bam.fetch(chromosome, start, start+window)))
    WT_5end_reads=len(list(WT_bam.fetch(chromosome, end-window, end)))
    mut_3end_reads=len(list(mut_bam.fetch(chromosome, start, start+window)))
    mut_5end_reads=len(list(mut_bam.fetch(chromosome, end-window, end)))

. .

ex. following is the bed file. (first column is "chromosome", 2nd col is "start", 3rd col is "end", and the last column is "strand")

chrI    500    2000    .    .    +
chrII    300    4000    .    .    -

To make sure, I am showing a part of full code and the full code is working.

But I'm not sure that the code to get reads from minus strand(-) is what I want.

chip-seq • 984 views
ADD COMMENTlink modified 7 weeks ago by Biostar ♦♦ 20 • written 3.1 years ago by BioDH0

Why don't you use a tool such as htseq-count or featureCounts to get the number of reads? You just need to construct an appropriate bed file.

ADD REPLYlink written 3.1 years ago by WouterDeCoster43k

In the likely event that you'd like to additionally plot the distribution of the reads in these regions, use computeMatrix and then plotHeatmap form deepTools.

ADD REPLYlink written 3.1 years ago by Devon Ryan95k
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

FYI, you want something like WT_5ends_reads = WT_bam.count(chromosome, start, start + window). That will be vastly more efficient than using fetch().

Anyway, I agree with WouterDeCoster that it's much easier to use something that already exists (you can use bedtools to modify the BED file to contain only the regions you care about).

The code for the - strand appears correct (see my note above about count()). Give that a try and then double check a region or two manually. Then you'll be certain whether it's performing correctly.

ADD COMMENTlink written 3.1 years ago by Devon Ryan95k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1350 users visited in the last hour