How to generate a .sam/.bam file of a particular chromosome or even a particular region?
3
2
Entering edit mode
8.1 years ago
Chenglin ▴ 230

I wanted to see a particular sequence in our re-sequencing data using IGV. However the .bam file is too big (10Gb) to handle. Since I am interested in only a small region in the genome, I wondering if there is any way to generate a .sam/.bam file only for that region. Thank you very much in advance!

Chenglin

alignment genome next-gen • 10k views
5
Entering edit mode
8.1 years ago

This has been answered many times before. You can specify a small region of your interest and create a small bam file as shown.

samtools view -bh chr1:100-200 > small.bam

2
Entering edit mode

FYI, -b will always write the header if it's present (this isn't documented, but see here), so samtools view -b chr1:100-200 > small.bam will do the same thing.

0
Entering edit mode

Thanks Devon. Didn't know that.

0
Entering edit mode

Thank you very much for your answer. Do I need to put the reference genome index file (My_genome.fasta.fai) in the command? Actually I typed something like this samtools view -bh chr1:100-200 My_genome.fasta.fai small.sam > small.bam, but I got error information [main_samview] fail to open "My_genome.fasta.fai" for reading. My index file is there, why can't it be read? Thanks!

0
Entering edit mode

You won't need the fai file at all.

The syntax is:

samtools view [OPTIONS] sorted_and_indexed.bam region

Edit: Just to make things clearer, in your case the command would be samtools view -b small.bam chr1:100-200 > small.subset.bam. The input must be a sorted and indexed BAM or CRAM file.

0
Entering edit mode

Thank you very much for your help. I tried and made it.

0
Entering edit mode

Thank you, Ashutosh. I just made it.

1
Entering edit mode
4.7 years ago
gsr9999 ▴ 230

You can also use sambamba tool to slice the existing bam to specific regions. It is much faster than samtools.

Example : $sambamba slice -o my_output_bam_file.bam Input_bam_file.bam chr1:100-200 Or you can extract reads for an entire chromosome by just specifying the chromosome name in option Example :$sambamba slice -o chr1_bam_file.bam Input_bam_file.bam chr1

1
Entering edit mode
16 months ago
etiennedanis ▴ 10
samtools view -b in.bam chr1:100000-200000 > out.bam


did not work for me.

But

samtools view -b in.bam 1:100000-200000 > out.bam


worked. It depends on the genome used for the alignment. Using

samtools idxstats in.bam


can be useful to figure out how to name the chromosomes.