Samtools Help needed: index seg fault after reheader
3
1
Entering edit mode
10.0 years ago
akhst7 ▴ 40

Hi,

I have this SAM file which has two MT entries (bolded) as follows;

@SQ    SN:MT    LN:16569
@SQ    SN:1    LN:249250621
@SQ    SN:10    LN:135534747
@SQ    SN:11    LN:135006516
@SQ    SN:12    LN:133851895
@SQ    SN:13    LN:115169878
@SQ    SN:14    LN:107349540
@SQ    SN:15    LN:102531392
@SQ    SN:16    LN:90354753
@SQ    SN:17    LN:81195210
@SQ    SN:18    LN:78077248
@SQ    SN:19    LN:59128983
@SQ    SN:2    LN:243199373
@SQ    SN:20    LN:63025520
@SQ    SN:21    LN:48129895
@SQ    SN:22    LN:51304566
@SQ    SN:3    LN:198022430
@SQ    SN:4    LN:191154276
@SQ    SN:5    LN:180915260
@SQ    SN:6    LN:171115067
@SQ    SN:7    LN:159138663
@SQ    SN:8    LN:146364022
@SQ    SN:9    LN:141213431
@SQ    SN:MT    LN:16569
@SQ    SN:X    LN:155270560
@SQ    SN:Y    LN:59373566

I got rid of the first MT entry since after BAM conversion, the first MT entry gives 0 mapping;

MT    16569    0    0
1    249250621    469558    0
10    135534747    241345    0
11    135006516    280732    0
12    133851895    256092    0
13    115169878    153564    0
14    107349540    168501    0
15    102531392    160025    0
16    90354753    154228    0
17    81195210    174856    0
18    78077248    118007    0
19    59128983    177415    0
2    243199373    399350    0
20    63025520    151498    0
21    48129895    93210    0
22    51304566    67376    0
3    198022430    332309    0
4    191154276    271651    0
5    180915260    295771    0
6    171115067    299632    0
7    159138663    259500    0
8    146364022    235990    0
9    141213431    230866    0
MT    16569    1608344    0
X    155270560    226806    0
Y    59373566    11557    0

So I got rid of the first MT entry by extracting a header by

samtools view -HS input.sam > header.sam 

and deleted it by using an text editor.

Then

samtools reheader header.sam input.bam > new.bam

I made sure the first MT entry was gone in the header;

SQ     SN:1    LN:249250621
@SQ     SN:10   LN:135534747
@SQ     SN:11   LN:135006516
@SQ     SN:12   LN:133851895
@SQ     SN:13   LN:115169878
@SQ     SN:14   LN:107349540
@SQ     SN:15   LN:102531392
@SQ     SN:16   LN:90354753
@SQ     SN:17   LN:81195210
@SQ     SN:18   LN:78077248
@SQ     SN:19   LN:59128983
@SQ     SN:2    LN:243199373
@SQ     SN:20   LN:63025520
@SQ     SN:21   LN:48129895
@SQ     SN:22   LN:51304566
@SQ     SN:3    LN:198022430
@SQ     SN:4    LN:191154276
@SQ     SN:5    LN:180915260
@SQ     SN:6    LN:171115067
@SQ     SN:7    LN:159138663
@SQ     SN:8    LN:146364022
@SQ     SN:9    LN:141213431
@SQ     SN:MT   LN:16569
@SQ     SN:X    LN:155270560
@SQ     SN:Y    LN:59373566

Ok Looking great but then when I further processed the new.BAM file by;

samtools sort new.bam
samtools index new.bam

Then I got an error message;

51501 segmentation fault samtools index Rsorted.bam

The samtools flagstat output looked fine;

7837322 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
7318383 + 0 mapped (93.38%:nan%)
7837322 + 0 paired in sequencing
3918661 + 0 read1
3918661 + 0 read2
6190627 + 0 properly paired (78.99%:nan%)
7119872 + 0 with itself and mate mapped
198511 + 0 singletons (2.53%:nan%)
89782 + 0 with mate mapped to a different chr
89781 + 0 with mate mapped to a different chr (mapQ>=5)

I am using samtools0.1.19-44428cd

I do really appreciate any help on deleting the extra MT entry in the sam header without having this seg-fault.

Thanks in advance.

RNA-Seq • 4.5k views
ADD COMMENT
2
Entering edit mode
10.0 years ago

You've unfortunately succeeded in changing every alignment in your BAM file such that they're now incorrect (go ahead and look at a few from the before and after and note that they all map to the same position, but on different chromosomes as before). One must be very careful when using the reheader command from samtools such that this doesn't happen. They simplest way that I know of to avoid this is to pipe the file in SAM to a small script which removes the first MT line and then pipes it back to samtools to remake the BAM file.

Edit: An example script would be something of the form:

samtools view -h foo.bam | awk '{if(found==0) {if($2=="SN:MT") {found=1; getline;}} print $0}' | samtools view -bSo foo.reformated.bam -
ADD COMMENT
0
Entering edit mode

Hi Devon,

Your script worked great !! It was my understanding from a samtools man page that reheader appeared to be used to manipulate sam/bam headers without messing read counts and also when I looked at some discussion on the web which recommended for the use of reheader.

Anyway thanks!!

ADD REPLY
3
Entering edit mode

The problem is that the chromosome that a read aligns to isn't stored in the read itself. Instead, an integer index (tid) is stored in the alignment and samtools (or any other program) looks in the header for that chromosome/contig associated with it. So, if you swapped the order of chr1 and chr2 in the header then all of the reads previously reported to be aligned to chr1 would then be aligned to chr2 (and vice versa). Similarly, by just replacing the header with one omitting a line, everything that previously aligned to the second listed chromosome (1 in your case) would then be aligned to what was previously the third chromosome (10 in your case). That's why there are usually warnings that the chromosomes/contigs should be in the same order when replacing a header.

This, BTW, is the benefit to making the change in SAM, since then the chromosome is stored with the alignment.

ADD REPLY
0
Entering edit mode

So, what is a proper way(s) and instance(s) to use samtools reheader?

Thanks.

ADD REPLY
0
Entering edit mode

If you want to change UCSC to Ensembl chromosome names then it's a convenient way to do so. Also, if you have a BAM file with alignments but no header (this is normally a programming error) then this is also useful.

ADD REPLY
0
Entering edit mode

I should add that it's also handy if you want to add optional header fields, like the @PG or @CO lines. Otherwise, a better name for that command is probably "samtools renameContigs".

ADD REPLY
1
Entering edit mode
10.0 years ago

why no just 'awk'-ing out the original sam?

awk '/^@SQ\tSN\:MT/ {i++; if(i==1) next;} {print;}' < in.sam > out.sam
ADD COMMENT
0
Entering edit mode
7.4 years ago
Shicheng Guo ★ 9.4k

Remove the formal header and then add a new header (only need fa and fa.fai) with:

samtools view -H SCNT.wnt5a_mouse.sort.bam > header.txt
samtools view SCNT.wnt5a_mouse.sort.bam | tail -n 300 > test.sam
samtools view -b -T ~/db/mm9/mm9.fa test.sam -o test.bam
samtools sort test.bam -o test.sort.bam
samtools index test.sort.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6