I have a bam files produced from mapping by LifeScope Tool (mapping of SOLiD reads). I would like to know the length of the reads that were mapped to the reference genome based on the information present in bam files. Is there a tool which can give me the stats of such things from bam files?
This command outputs the length distribution of both mapped and unmapped reads right? I thought the question was to ask the length distribution output of only mapped reads.
thanks a lot! so if I only want unmapped reads length distribution, I just need to make the "F" case lower "f" right? for example samtools view -f 4 file.bam. I tried "samtools view file.bam", "samtools view -F 4 file.bam" and "samtools view -f file.bam", I found that the reads count in a certain length in -f output and -F output did not add up to the same length counts in "samtools view file.bam", I am confused, I thought the mapped and unmapped read counts should added up to the total counts of a certain length in "samtools view file.bam" right?
Hi, thank you for your answer. Do you know how to modify your code if I want to specify the reads in a certain region, for example, want to know the reads length distribution that are only located in chr2?
if you need to know alignment length of the read on the reference after it aligned: Using bioperl modules
##mapped_read_Length.pl
#!/usr/bin/perl -w
use strict;
use warnings;
use Bio::DB::Sam;
my $usage = "Usage $0 <input.bam>\n";
my $infile = shift or die $usage;
my $sam = Bio::DB::Sam->new(-bam => $infile);
my @chromosomes = $sam->seq_ids;
print "Read_ID\tChr\tStart\tEnd\tMapped_Length\tCIGAR_String\n";
foreach my $chr (@chromosomes){
my @alignments = $sam->get_features_by_location(-seq_id => $chr);
for my $a (@alignments) {
my $id = $a->name;
my $cigar = $a->cigar_str;
my $start = $a->start;
my $end = $a->end;
my $mapped_length = $end-$start;
print "$id\t$chr\t$start\t$end\t$mapped_length\t$cigar\n";
}
}
__END__
There are many ways to skin a cat. Below is a c++ implementation of BamTools that will produce what your want.
You can get much more information out the the al BamTools::Alignment object.
I don't have experience with C++. Can it be implemented in a different language? I have used samtools.1 view to get the basic info. I think the output is in sam format. Probably I can write a code to parse the column 6 - CIGAR and output the length. But I was wondering if there is a tool already to do it. I didn't want to mistakes while computing length from CIGAR.
Looked into bamtools, but like I said I'm not good at C++.
BEDOPS includes a bam2bed conversion script, which filters unmapped reads in the course of converting to BED. You can therefore get mapped read lengths by piping the sequence field of the converted data to awk, e.g.:
If you use samtools directly with alternatives suggested in other answers, you might first want to prefilter your data to mapped reads with GNU awk, before piping to downstream utilities:
If you are given bam files and want to know the READ length that was used in sequencing, this will work.
replace bamfile by your bam file (and make sure it was indexed .. e.g. a bamfile.bam.bai exists)
Do you want the length after the read is aligned, including insertions and deletions?
I don't need to include insertions and deletions.
Hi Zev, what would the syntax be if I include indels and gaps in the reads?