Calculate nucleotide frequency for a position in a bam file using Bio::DB Perl module
0
0
Entering edit mode
5.5 years ago

Hi everyone,

I am new to Bioperl so please bear with me.

I have a bam file and a chr-position file that looks like this:

10    1156771
10    37484026
10    78483209
10    82960189

I have to calculate the number of A, T, G, C, N, a, t, g, c (forward strand and reverse strand) for each of the positions in the position file, such that the output looks like:

chr pos      depth A T G C a t g c N
10  1156771  3     1 2 0 0 0 0 0 0 0
10  37484026 5     0 0 3 2 0 0 0 0 0

Now, I know this exact task can be done by:

1. samtools mpileup -> pileup2indel.pl/Rsamtools or other parsing scripts

2. bam-readcount (directly takes position file and bam file to generate that output)

3. other existing tools

And I have done it before too using bam-readcount! But I have been told to do it using Bio::DB::Sam module.

Anyway, the following is my code to generate pileup at a particular position, the only thing I can't figure out now is getting the reference base ($refBase): #!/usr/bin/perl use Bio::DB::Sam; my$sam = Bio::DB::Sam->new(-bam => "sample.bam", -fasta => "hg19.fasta");
my $cb = sub { my ($seqid, $pos,$pileups) = @_;
my $refBase =$sam->segment($seqid,$pos,$pos)->dna; # cannot access reference base.$refBase is always empty.
print "\n$pos\t$refBase=>";
my @tmp;
my @strtmp;
for my $pileup (@$pileups){
my $al =$pileup->alignment;
my $strand =$al->strand;
push(@strtmp,$strand); my$qBase = substr($al->qseq,$pileup->qpos, 1);
# to account for reverse strand
# if $strand is 1 then$qBase will remain same
# if $strand is -1 then$qBase will change to small
if($strand==-1){$qBase = lc $qBase; } push(@tmp,$qBase);
#print "$qBase"; }$scalstrand = join("",@strtmp);
$scal = join("", @tmp); print$scal,"\t";
print $scalstrand,"\t"; my$len = length($scal); my$A = $scal =~ tr/A//; my$T = $scal =~ tr/T//; my$C = $scal =~ tr/C//; my$G = $scal =~ tr/G//; my$a = $scal =~ tr/a//; my$c = $scal =~ tr/c//; my$g = $scal =~ tr/g//; my$t = $scal =~ tr/t//; # print the depth and counts of A,T,G,C,a,t,g,c print "Depth:$len, A:$A, T:$T, C:$C, G:$G, a:$a, t:$t, c:$c, g:$g\n";
@tmp=();
};
# call pileup for chr1:10000 only
$sam->pileup('1:10000-10000',$cb);

Thanks!

myposts perl bioperl pileup • 2.9k views
0
Entering edit mode
0
Entering edit mode

Yes, because they asked me to delete it there as it was not the relevant community. So now it is only on Biostars. Is that ok?