Question: SAM file and alignment
0
gravatar for qudrat
20 months ago by
qudrat70
NATIONAL INSTITUTE OF IMMUNOLOGY, INDIA
qudrat70 wrote:

Hello everyone,
I have a SAM file from Tophat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

rna-seq alignment • 562 views
ADD COMMENTlink modified 20 months ago by Istvan Albert ♦♦ 81k • written 20 months ago by qudrat70
4

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 20 months ago by WouterDeCoster41k

I am not using TopHat, I already have SAM file from TopHat and I want to extract multiple reads sharing the same intron boundary based on CIGAR string.

ADD REPLYlink written 20 months ago by qudrat70
1
gravatar for Istvan Albert
20 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

If I understood the question correctly you'd like to access reads that span across the exon/intron boundary and contain the exon. Which makes it a bit tricker than a simple intersect.

You can't quite use the CIGAR string alone since that does not contain the coordinate. Working that out from the position would take some custom programming effort and would duplicate existing functionality in other libraries.

If you are able to use PySam the pileup method on the last coordinate of the exon might work. It states:

An alternative way of accessing the data in a SAM file is by iterating over each base of a specified region using the pileup() method. Each iteration returns a PileupColumn which represents all the reads in the SAM file that map to a single base in the reference sequence.

http://pysam.readthedocs.io/en/latest/api.html

You will still need to check that the end of the alignment is past the coordinate.

ADD COMMENTlink written 20 months ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1336 users visited in the last hour