Question

Getting No. of reads from BAM by pysam (ChIP-seq)

0

Entering edit mode

7.0 years ago

BioDH ▴ 10

I'm trying to get no. of reads form (TSS ~ +500) and (-500 to pA site) of each gene.

What I want to get is the no of reads of TSS proximal and pA proximal region(window = 500)

If I have a bed file that contains two genes, one plus strand and the other minus strand,

I was wondering if following script is what I want.

WT_bam=pysam.AlignmentFile("./WT.bam", "rb")
mut_bam=pysam.AlignmentFile("./mut.bam", "rb")


def getReads (chromosome, start, end, strand, window=500):
if (strand=="+"):
    WT_5end_reads=len(list(WT_bam.fetch(chromosome, start, start+window)))
    WT_3end_reads=len(list(WT_bam.fetch(chromosome, end-window, end)))
    mut_5end_reads=len(list(mut_bam.fetch(chromosome, start, start+window)))
    mut_3end_reads=len(list(mut_bam.fetch(chromosome, end-window, end)))
elif (strand=="-"):
    WT_3end_reads=len(list(WT_bam.fetch(chromosome, start, start+window)))
    WT_5end_reads=len(list(WT_bam.fetch(chromosome, end-window, end)))
    mut_3end_reads=len(list(mut_bam.fetch(chromosome, start, start+window)))
    mut_5end_reads=len(list(mut_bam.fetch(chromosome, end-window, end)))
.

. .

ex. following is the bed file. (first column is "chromosome", 2nd col is "start", 3rd col is "end", and the last column is "strand")

chrI    500    2000    .    .    +
chrII    300    4000    .    .    -

To make sure, I am showing a part of full code and the full code is working.

But I'm not sure that the code to get reads from minus strand(-) is what I want.

ChIP-Seq • 1.6k views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 7.0 years ago by BioDH ▴ 10

3

Entering edit mode

Why don't you use a tool such as htseq-count or featureCounts to get the number of reads? You just need to construct an appropriate bed file.

ADD REPLY • link 7.0 years ago by WouterDeCoster 47k

1

Entering edit mode

In the likely event that you'd like to additionally plot the distribution of the reads in these regions, use computeMatrix and then plotHeatmap form deepTools.

ADD REPLY • link 7.0 years ago by Devon Ryan 104k

score 2 · Answer 1 · 2017-04-20

FYI, you want something like WT_5ends_reads = WT_bam.count(chromosome, start, start + window). That will be vastly more efficient than using fetch().

Anyway, I agree with WouterDeCoster that it's much easier to use something that already exists (you can use bedtools to modify the BED file to contain only the regions you care about).

The code for the - strand appears correct (see my note above about count()). Give that a try and then double check a region or two manually. Then you'll be certain whether it's performing correctly.