Parsing bam file and compare it from gtf file through pysam
0
0
Entering edit mode
2.4 years ago
anasjamshed ▴ 120

I have 3 files bam files, one is. bam,2nd one is the index .bai file and 3rd one is gtf file.

The name of the BAM file is sample_sort.bam. The two columns of interest for purpose are column 3, the name of the chromosome, and column 5, the start position of the alignment. The gtf file contains the chromosome in the first column and the start and end positions of an exon in the 4th and 5th column, and the gene name in the last column.

First I want to parse bam file then make index and finally compare it to a file that contains gene annotations (GTF). I

I started like: pysam

But I don't know how to convert bam into sam through pysam. Also, I don't know how to parse given columns and and compare it to to gtf file. My final output should be matrix with two columns, one for the gene name and the other for the number of reads that matched .

Plz help me

pysam python • 2.3k views
ADD COMMENT
0
Entering edit mode

My final output should be matrix with two columns, one for the gene name and the other for the number of reads that matched

Read count per exon per transcript

ADD REPLY
0
Entering edit mode

thanks but I need to compare bam with sam by using python

ADD REPLY
0
Entering edit mode

Well from a Python point of view, the syntax error in cell 8 telling you the issue is that caret symbol. That is an operator that Python respects for testing for conditions (IPython can also use it for writing with %store magic command), and so Python's not happy because it's context is all wrong being around commas and as an argument to a method call. And, more importantly, that symbol isn't being used how you mean it. You probably are thinking you are using it to write to a file?

Maybe try the following in your Jupyter notebook:

sam_out = pysam.view("-h", "sample_sort.bam") 
%store sam_out >samplesort.sam

The pysam.view part of that is based on https://stackoverflow.com/a/59314536/8508004 showing pysam.view(ops, bamfile, '1:2010000-20200000','2:2010000-20200000') and this post suggesting pysam.view("-S", "file.sam") as proper syntax. The rest is using some Python/IPython in Jupyter.
The first line assigns the output of pysam.view to a Python variable sam_out. The %store line using Jupyter/IPython magics to save the sam_out to a file named samplesort.sam, see here. Normally, the ways to write to a file in Python are a bit more verbose, but that store command is a nice Bash-like shortcut you can use with IPython or Jupyter.

If that doesn't work, try the longer version of writing to a file like here, something like:

pysam.view('-F', '0x4', '-b', '-h', '-o', 'samplesort.sam', 'sample_sort.bam', catch_stdout=False)

Note that I didn't try converting that last suggestion to the specific options you seemed interested in. And so, you'll probably want to adapt that further.
And so if you get something working you may want to post a follow-up to help others.

ADD REPLY
0
Entering edit mode

I want to convert bam into sam

ADD REPLY
0
Entering edit mode

You converted bam into sam in your first two cells you show in your notebook, right?

Did you try after you ran those two cells, the following to write it to a file as you seem to want from that pysamview attempt:

%store samfile >samplesort.sam
ADD REPLY
0
Entering edit mode

It's working but I want a matrix which contains read count and gene names

ADD REPLY
0
Entering edit mode

I want this type of matrix : output

ADD REPLY

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6