Question

Plotting .tab/.bam file

0

Entering edit mode

4.9 years ago

zack.henning • 0

Hey all,

I am trying to find a way to create a scatter plot and histogram using matplotlib for an alignment I generated. I aligned my reads to a bacterial genome and I indexed and sorted the file, and used using:

samtools view -b s_oneidensis_alignemnt_sensitive.sam > alignment.bam
samtools sort alignment.bam > alignment.sorted.bam
samtools index alignment.sorted.bam
samtools depth -a alignment.sorted.bam > pileup.tab

Now I'd like to generate a scatterplot with x-axis = position in genome and y-axis = depth of coverage and then a histogram with x-axis = depth of coverage and y-axis = read count. I'm still new to python and trying to figure out a method using the .tab file or should I use the .bam file? Any help or nudges in the right direction would be greatly appreciated. Thanks!

matplotlib tab python bam • 2.6k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 4.9 years ago by zack.henning • 0

0

Entering edit mode

A similar topic has been discussed in How to plot coverage and depth statistics of a bam file. Tab file (.tab) is just another text file where the columns are tab-separated, read the pileup.tab file using pandas and plot using pyplot.

ADD REPLY • link 4.9 years ago by Arup Ghosh 3.2k

0

Entering edit mode

So the issue I'm having with this is extracting the alignment .tab file's columns into a list my current code is its indexing the first strings indexed in a row not the column (NCBI ascension ID for the genome) such as A and E with this code:

    %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = []
y = []

table = pd.read_csv('pileup.tab', sep='\t')
alignment = pd.DataFrame(data=table)
for column in alignment:
    x.append(int(column[1]))
    y.append(int(column[2]))

plt.plot(x, y, 'ro')
plt.xlabel('Position in Genome')
plt.ylabel('Depth of Coverage')
plt.show()

ADD REPLY • link 4.9 years ago by zack.henning • 0