extracting reads from BAM files for gene coordinates Tx start Txs end
3
0
Entering edit mode
6.0 years ago
amitpande74 ▴ 20

Hi,

      I have a list of Refseq annotated gene coordinates and I would like to extract the reads for the same in one go. I have read many posts here and at Seqanswers, but those are restricted to a particular region. for example samtools view input.bam "Chr10:18000-45500" > output.bam.

How can one do this for a list of genes with the Transcription start and end positions?

I tried doing this by extracting theit extr Refseq gene names and the associated coordinates from the genome browser at UCSC. Then using samtolls view input.BAM "mygenes.bed" > output.bam. But it extracts the reads and the format is not BAM.

I am new to NGS can someone help.

RNA-Seq samtools • 3.1k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
samtools view -b -L regions.bed input.bam > selected.bam

You should be careful with 0-based and 1-based coordinates

ADD COMMENT
1
Entering edit mode
6.0 years ago

Given a sorted file called genes.bed with your Tx starts and ends, you can use BEDOPS bedops and bam2bed running in a bash environment:

$ bedops --element-of 1 <(bam2bed < input.bam) genes.bed > answer.bed

Fast and memory efficient, and useful if you want BED as output for downstream ops.

ADD COMMENT
0
Entering edit mode
6.0 years ago
swbarnes2 9.7k

Another possibility; use IntersectBed from BEDTools.

ADD COMMENT

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6