Bam File: Extract Chromosome Number And Start Position Reads
3
1
Entering edit mode
12.9 years ago
Lisanne ▴ 10

Dear All,

I am working with BAM files for chip-seq analysis. For each read in the BAM file, I want to extractt the chromosome number, the start position and the stop position and replace that in a new file. Is that possible?

Many thanks!

bam samtools • 16k views
ADD COMMENT
10
Entering edit mode
12.9 years ago
Farhat ★ 2.9k

samtools view bamfile.bam|awk '{print $3 "\t" $4 "\t" $4+length($10)-1}' > newfile.tab

will do the job. The stop position here is the last matching position.

ADD COMMENT
4
Entering edit mode

bedtools' bamToBed inspects the CIGAR string when computing the end coordinate, so deletions are properly handled. the example here assumes that only substitutions can occur,

ADD REPLY
1
Entering edit mode

this approach has a minor issue that the length of the sequence does not necessarily agree with the span of the alignment, e.g. indels

ADD REPLY
0
Entering edit mode

Thanks to Farhat, that works fine for me! :)

ADD REPLY
0
Entering edit mode

True. bamToBed would be the proper tool to handle something like this.

ADD REPLY
2
Entering edit mode
12.9 years ago
Michael 55k

Yes it is pssible, using e.g. SAMtools samtools view or Rsamtools.

The reference sequence name is in column 3 of a SAM file, the (leftmost) start in column 4 and the end position needs to be calculated using the CIGAR string (e.g. start + alignment length).

ADD COMMENT
0
Entering edit mode

In this case how is strand information treated? Is it always start + alignment length or does it depends on the strand where the read mapped?

ADD REPLY
1
Entering edit mode
12.9 years ago
Swbarnes2 ★ 1.6k

You could also try BEDTools, which can convert your .bam to a .bed, and bed pretty much is a name, a chromosome, a start and a stop position.

ADD COMMENT

Login before adding your answer.

Traffic: 718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6