Question: Bam File: Extract Chromosome Number And Start Position Reads
1
gravatar for Lisanne
7.4 years ago by
Lisanne10
Lisanne10 wrote:

Dear All,

I am working with BAM files for chip-seq analysis. For each read in the BAM file, I want to extractt the chromosome number, the start position and the stop position and replace that in a new file. Is that possible?

Many thanks!

bam samtools • 8.6k views
ADD COMMENTlink written 7.4 years ago by Lisanne10
8
gravatar for Farhat
7.4 years ago by
Farhat2.9k
Pune, India
Farhat2.9k wrote:

samtools view bamfile.bam|awk '{print $3 "\t" $4 "\t" $4+length($10)-1}' > newfile.tab

will do the job. The stop position here is the last matching position.

ADD COMMENTlink written 7.4 years ago by Farhat2.9k
2

bedtools' bamToBed inspects the CIGAR string when computing the end coordinate, so deletions are properly handled. the example here assumes that only substitutions can occur,

ADD REPLYlink written 7.4 years ago by Aaronquinlan10k
1

this approach has a minor issue that the length of the sequence does not necessarily agree with the span of the alignment, e.g. indels

ADD REPLYlink written 7.4 years ago by Wen.Huang1.1k

Thanks to Farhat, that works fine for me! :)

ADD REPLYlink written 7.4 years ago by Lisanne10

True. bamToBed would be the proper tool to handle something like this.

ADD REPLYlink written 7.3 years ago by Farhat2.9k
2
gravatar for Michael Dondrup
7.4 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

Yes it is pssible, using e.g. SAMtools samtools view or Rsamtools.

The reference sequence name is in column 3 of a SAM file, the (leftmost) start in column 4 and the end position needs to be calculated using the CIGAR string (e.g. start + alignment length).

ADD COMMENTlink written 7.4 years ago by Michael Dondrup45k

In this case how is strand information treated? Is it always start + alignment length or does it depends on the strand where the read mapped?

 

ADD REPLYlink written 4.0 years ago by mjg20
1
gravatar for Swbarnes2
7.4 years ago by
Swbarnes21.4k
Swbarnes21.4k wrote:

You could also try BEDTools, which can convert your .bam to a .bed, and bed pretty much is a name, a chromosome, a start and a stop position.

ADD COMMENTlink written 7.4 years ago by Swbarnes21.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1227 users visited in the last hour