Clustering based on Alignment
0
0
Entering edit mode
9.2 years ago
ilyco ▴ 60

Hi,

What is the fastest way to get the distances between reads (irrespective of the strand they were aligned to) from a SAM/BAM alignment file?

Context: I aligned short RNA-seq reads using Bowtie and I would like to store the distances between reads on each chromosome so I can try some clustering method that would group the reads.

Thank you!

RNA-Seq Genomic Coordinates Distance Bowtie • 2.1k views
ADD COMMENT
0
Entering edit mode

Are you trying to perform feature discovery (i.e., find unannotated transcripts)? There are programs already written that will do this (e.g. cufflinks), so you don't need to reinvent the wheel.

Regarding your actual question, you first have to define what you mean by distance. Are we just using the minimum distance between any two of their mapped bases (this is likely the case) or are do you only want the distance between a single given end of each alignment? Should there be a differentiation made between a complete and partial overlap? Do you really want the distance between all alignments, other only those within a given window?

ADD REPLY
0
Entering edit mode

I chose the best alignment for each read so I now have a set of putative positions for each read. By distance, I mean the number of bp between a read and the next one on the chromosome based on the genomic coordinates mentioned by the alignment. In the meantime I figured out that it is easier to convert the BAM file to the BED format and just use the coordinates there.

ADD REPLY

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6