filtering genes out from bam file using samtools
1
0
Entering edit mode
7.3 years ago
Sara ▴ 240

I have some bam files and would like to use them for further analysis. but I want to do more analysis only on some of the genes not everything in the bam file. I have a list of genes that I want to use for the next step. also in my list I have both gene symbol and gene ids like:

AAAS ENSG00000094914
ACO2 ENSG00000100412

I thought samtools can help to get such a bam file (like the following command), but I don't know what should be included instead of ? marks or if that is possible.

samtools view -h in.bam | ??????  > out.bam

actually I searched for that but did not find anything useful. do you know how I can get a new bam file only for the genes in my list?

alignment • 5.5k views
ADD COMMENT
3
Entering edit mode
7.3 years ago

If you mapped the reads on the genome, you will need the genome coordinates of the genes you are interested in. Once you have that simply use the "regions" in samtools view :

samtools view --help
Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]
[...]
 A region should be presented in one of the following formats:
 `chr1', `chr2:1,000' and `chr3:1000-2,000'. When a region is
 specified, the input alignment file must be a sorted and indexed
 alignment (BAM/CRAM) file.

samtools view -h -b in.bam region1 region2 ... regionX > out.bam

If you have many genes of interest, it can be more convenient to first convert their genomic coordinates in bed format (which is 0-based, FYI), then use samtools with the -L option :

-L FILE  only include reads overlapping this BED FILE [null]

samtools view -h -b -L my_genes_coordinates.bed in.bam > out.bam
ADD COMMENT
1
Entering edit mode

should be :

samtools view -h -b -L my_genes_coordinates.bed in.bam > out.bam
ADD REPLY
0
Entering edit mode

nice catch, I edited, thank you !

ADD REPLY

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6