Question: SAM file and alignment
0
gravatar for qudrat
13 months ago by
qudrat60
NATIONAL INSTITUTE OF IMMUNOLOGY, INDIA
qudrat60 wrote:

Hello everyone,
I have a SAM file from Tophat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

rna-seq alignment • 443 views
ADD COMMENTlink modified 13 months ago by Istvan Albert ♦♦ 79k • written 13 months ago by qudrat60
4

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 13 months ago by WouterDeCoster37k

I am not using TopHat, I already have SAM file from TopHat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

ADD REPLYlink written 13 months ago by qudrat60
1
gravatar for Istvan Albert
13 months ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

If I understood the question correctly you'd like to access reads that span across the exon/intron boundary and contain the exon. Which makes it a bit tricker than a simple intersect.

You can't quite use the CIGAR string alone since that does not contain the coordinate. Working that out from the position would take some custom programming effort and would duplicate existing functionality in other libraries.

If you are able to use PySam the pileup method on the last coordinate of the exon might work. It states:

An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the pileup() method. Each iteration returns a PileupColumn which represents all the reads in the SAM file that map to a single base in the reference sequence.

http://pysam.readthedocs.io/en/latest/api.html

You will still need to check that the end of the alignment is past the coordinate.

ADD COMMENTlink written 13 months ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1611 users visited in the last hour