Question: Calculate nucleotide frequency for a position in a bam file using Bio::DB Perl module
0
gravatar for Sameer Chavan
3.7 years ago by
United States
Sameer Chavan0 wrote:

Hi everyone,

I am new to Bioperl so please bear with me. 

I have a bam file and a chr-position file that looks like this:

10    1156771
10    37484026
10    78483209
10    82960189

I have to calculate the number of A, T, G, C, N, a, t, g, c (forward strand and reverse strand) for each of the positions in the position file, such that the output looks like:

chr pos      depth A T G C a t g c N
10  1156771  3     1 2 0 0 0 0 0 0 0
10  37484026 5     0 0 3 2 0 0 0 0 0

Now, I know this exact task can be done by:

1. samtools mpileup -> pileup2indel.pl/Rsamtools or other parsing scripts

2. bam-readcount (directly takes position file and bam file to generate that output)

3. other existing tools 

And I have done it before too using bam-readcount! But I have been told to do it using Bio::DB::Sam module.

Anyway, the following is my code to generate pileup at a particular position, the only thing I can't figure out now is getting the reference base ($refBase):

#!/usr/bin/perl
use Bio::DB::Sam;
my $sam = Bio::DB::Sam->new(-bam => "sample.bam", -fasta => "hg19.fasta");
my $cb = sub {
        my ($seqid, $pos, $pileups) = @_;
        my $refBase = $sam->segment($seqid,$pos,$pos)->dna;
        # cannot access reference base. $refBase is always empty.
        print "\n$pos\t$refBase=>";
        my @tmp;
        my @strtmp;
        for my $pileup (@$pileups){
                my $al = $pileup->alignment;
                my $strand = $al->strand;
                push(@strtmp,$strand);
                my $qBase = substr($al->qseq, $pileup->qpos, 1);
                # to account for reverse strand
                # if $strand is 1 then $qBase will remain same
                # if $strand is -1 then $qBase will change to small
                if($strand==-1){
                        $qBase = lc $qBase;
                }
                push(@tmp,$qBase);
                #print "$qBase";
        }
        $scalstrand = join("",@strtmp);
        $scal = join("", @tmp);
        print $scal,"\t";
        print $scalstrand,"\t";
        my $len = length($scal);
        my $A = $scal =~ tr/A//;
        my $T = $scal =~ tr/T//;
        my $C = $scal =~ tr/C//;
        my $G = $scal =~ tr/G//;
        my $a = $scal =~ tr/a//;
        my $c = $scal =~ tr/c//;
        my $g = $scal =~ tr/g//;
        my $t = $scal =~ tr/t//;
        # print the depth and counts of A,T,G,C,a,t,g,c
        print "Depth:$len, A:$A, T:$T, C:$C, G:$G, a:$a, t:$t, c:$c, g:$g\n";
        @tmp=();
};
# call pileup for chr1:10000 only
$sam->pileup('1:10000-10000', $cb);

 

Thanks!

 

 

myposts pileup bioperl perl • 2.1k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Sameer Chavan0

cross posted (& deleted ) http://stackoverflow.com/questions/33307259

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum121k

Yes, because they asked me to delete it there as it was not the relevant community. So now it is only on Biostars. Is that ok?

ADD REPLYlink written 3.7 years ago by Sameer Chavan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1453 users visited in the last hour