Question: Reordering the bam file
2
gravatar for Dataminer
4.7 years ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Hi ,

I have my bam file in this order:

samtools view -H my.bam

 

@CO     Duplicates marked (bamUtil Version: 1.0.2)
@SQ     SN:chr1 LN:249250621    AS:hg19
@SQ     SN:chr10        LN:135534747    AS:hg19
@SQ     SN:chr11        LN:135006516    AS:hg19
@SQ     SN:chr12        LN:133851895    AS:hg19
@SQ     SN:chr13        LN:115169878    AS:hg19
@SQ     SN:chr14        LN:107349540    AS:hg19
@SQ     SN:chr15        LN:102531392    AS:hg19
@SQ     SN:chr16        LN:90354753     AS:hg19
@SQ     SN:chr17        LN:81195210     AS:hg19
@SQ     SN:chr18        LN:78077248     AS:hg19
@SQ     SN:chr19        LN:59128983     AS:hg19
@SQ     SN:chr2 LN:243199373    AS:hg19
@SQ     SN:chr20        LN:63025520     AS:hg19
@SQ     SN:chr21        LN:48129895     AS:hg19
@SQ     SN:chr22        LN:51304566     AS:hg19
@SQ     SN:chr3 LN:198022430    AS:hg19
@SQ     SN:chr4 LN:191154276    AS:hg19
@SQ     SN:chr5 LN:180915260    AS:hg19
@SQ     SN:chr6 LN:171115067    AS:hg19
@SQ     SN:chr7 LN:159138663    AS:hg19
@SQ     SN:chr8 LN:146364022    AS:hg19
@SQ     SN:chr9 LN:141213431    AS:hg19
@SQ     SN:chrM LN:16571        AS:hg19
@SQ     SN:chrX LN:155270560    AS:hg19
@SQ     SN:chrY LN:59373566     AS:hg19
@PG     ID:bwa  PN:bwa  VN:0.6.1-r104


And I wanna order it as, this:

@CO     Duplicates marked (bamUtil Version: 1.0.2)
@SQ     SN:chr1 LN:249250621    AS:hg19
@SQ     SN:chr2 LN:243199373    AS:hg19
@SQ     SN:chr3 LN:198022430    AS:hg19
@SQ     SN:chr4 LN:191154276    AS:hg19
@SQ     SN:chr5 LN:180915260    AS:hg19
@SQ     SN:chr6 LN:171115067    AS:hg19
@SQ     SN:chr7 LN:159138663    AS:hg19
@SQ     SN:chr8 LN:146364022    AS:hg19
@SQ     SN:chr9 LN:141213431    AS:hg19
@SQ     SN:chr10        LN:135534747    AS:hg19
@SQ     SN:chr11        LN:135006516    AS:hg19
@SQ     SN:chr12        LN:133851895    AS:hg19
@SQ     SN:chr13        LN:115169878    AS:hg19
@SQ     SN:chr14        LN:107349540    AS:hg19
@SQ     SN:chr15        LN:102531392    AS:hg19
@SQ     SN:chr16        LN:90354753     AS:hg19
@SQ     SN:chr17        LN:81195210     AS:hg19
@SQ     SN:chr18        LN:78077248     AS:hg19
@SQ     SN:chr19        LN:59128983     AS:hg19
@SQ     SN:chr20        LN:63025520     AS:hg19
@SQ     SN:chr21        LN:48129895     AS:hg19
@SQ     SN:chr22        LN:51304566     AS:hg19
@SQ     SN:chrX LN:155270560    AS:hg19
@SQ     SN:chrY LN:59373566     AS:hg19
@SQ     SN:chrM LN:16571        AS:hg19
@PG     ID:bwa  PN:bwa  VN:0.6.1-r104

How can I sort it in this order and arrange it?

Thank you

 

 

bamtools chip-seq samtools • 2.7k views
ADD COMMENTlink modified 4.7 years ago by Ming Tang2.5k • written 4.7 years ago by Dataminer2.6k
4
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

use picard ReorderSam http://picard.sourceforge.net/command-line-overview.shtml

<cite> ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs</cite>
ADD COMMENTlink written 4.7 years ago by Pierre Lindenbaum120k
0
gravatar for Ming Tang
4.7 years ago by
Ming Tang2.5k
Houston/MD Anderson Cancer Center
Ming Tang2.5k wrote:

you can use samtools reheader to do it.

for example :

edit a bam header(insert a line)samtools view -H my.bam | sed '24 a @SQ SN:chrY LN:59373566' | samtools reheader - my.bam > my.reheaded.bam

ADD COMMENTlink written 4.7 years ago by Ming Tang2.5k
2

wrong: reheader won't change the order of the reads in the bam file . The reference index of the reads will be wrongly-assigned.

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum120k
1

Please don't ever do that. You'll end up changing the chromosome to which all of your reads align. The reheader command should only ever be used to change the name (but never the order) of contigs/chromosomes. Picard handles this better (see Pierre's answer).

ADD REPLYlink written 4.7 years ago by Devon Ryan90k

Thank you for your comment. I had a bam file (downloaded from the ENCODE) sequenced from a female, and it does not have chrY in the header. one of the program I used complained about missing the chrY info in the header, and I have to manually add it by samtools reheader. Does  it affect the alignment?

ADD REPLYlink written 4.7 years ago by Ming Tang2.5k
1

As long as you appended chrY at the end then no, it didn't. If you put it anywhere else, then yes, the alignments were affected.

ADD REPLYlink written 4.7 years ago by Devon Ryan90k

I read this discussion with interest and wished you could clarify a point. Say that you generate an unsorted SAM file. When you invoke picard/samtools to sort by coordinate, the SQ header lines will be used to set how you reorder your reads. If you want a different order, it would make sense to me to change the order of the SQ lines as long as that is done before invoking picard/samtools to reorder. Every record in a SAM/BAM file includes the name of the chromosome, so I don't see how going through the workflow above may mess things up. Indexing the BAM file of course would be performed after reordering. So reheading > sorting > indexing. By contrast, looks to me that picard/samtools won't help reordering reads as you wanted if the order in the SQ header lines is not what you want. What am I missing?

 

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by roberto.spreafico10
2

Every record in SAM files contains the name of the chromosome, but that's not true in BAM files. In BAM files, every record contains a tid field, which is an index into a structure held in the header (equivalent to the @SQ lines in a SAM file). If you reheader a BAM file, then this structure is replaced, but the alignment records themselves and their tid fields are unchanged. So, if you swap the order of chromosomes with samtools reheader of a BAM file then you'll do an in-place alteration of the mappings. This is typically not what people want to do.

ADD REPLYlink written 4.0 years ago by Devon Ryan90k

Thanks Devon, very clear. In my workflow I rehead a SAM file (with a bash script) to sort SQ header lines as I want. The product is still a SAM file. I then use picard to sort by coordinate (which now forces my SQ header line order) the SAM file, convert to BAM, and index, all together. This seems not to produce any unwanted behavior as reheading is done at the SAM stage, correct? I can consider also less elaborate ways to enforce the sorting I want, but I looked into samtools/picard and all they offer is sorting by read name or by coordinate (that is, the SQ header line).

ADD REPLYlink written 4.0 years ago by roberto.spreafico10
1

Replacing the header of SAM files should always be a safe operation (unless you drop a chromosome, which will either make alignments unmapped or cause an error...I don't recall which).

ADD REPLYlink written 4.0 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour