Question: Samtools Help needed: index seg fault after reheader
1
gravatar for akhst7
4.0 years ago by
akhst740
United States
akhst740 wrote:

Hi, 

 

I have this SAM file which has two MT entries (bolded) as follows;

  1. @SQ    SN:MT    LN:16569
  2. @SQ    SN:1    LN:249250621
  3. @SQ    SN:10    LN:135534747
  4. @SQ    SN:11    LN:135006516
  5. @SQ    SN:12    LN:133851895
  6. @SQ    SN:13    LN:115169878
  7. @SQ    SN:14    LN:107349540
  8. @SQ    SN:15    LN:102531392
  9. @SQ    SN:16    LN:90354753
  10. @SQ    SN:17    LN:81195210
  11. @SQ    SN:18    LN:78077248
  12. @SQ    SN:19    LN:59128983
  13. @SQ    SN:2    LN:243199373
  14. @SQ    SN:20    LN:63025520
  15. @SQ    SN:21    LN:48129895
  16. @SQ    SN:22    LN:51304566
  17. @SQ    SN:3    LN:198022430
  18. @SQ    SN:4    LN:191154276
  19. @SQ    SN:5    LN:180915260
  20. @SQ    SN:6    LN:171115067
  21. @SQ    SN:7    LN:159138663
  22. @SQ    SN:8    LN:146364022
  23. @SQ    SN:9    LN:141213431
  24. @SQ    SN:MT    LN:16569
  25. @SQ    SN:X    LN:155270560
  26. @SQ    SN:Y    LN:59373566

I got rid of the first MT entry since after BAM conversion, the first MT entry gives 0 mapping;

  1. MT    16569    0    0
  2. 1    249250621    469558    0
  3. 10    135534747    241345    0
  4. 11    135006516    280732    0
  5. 12    133851895    256092    0
  6. 13    115169878    153564    0
  7. 14    107349540    168501    0
  8. 15    102531392    160025    0
  9. 16    90354753    154228    0
  10. 17    81195210    174856    0
  11. 18    78077248    118007    0
  12. 19    59128983    177415    0
  13. 2    243199373    399350    0
  14. 20    63025520    151498    0
  15. 21    48129895    93210    0
  16. 22    51304566    67376    0
  17. 3    198022430    332309    0
  18. 4    191154276    271651    0
  19. 5    180915260    295771    0
  20. 6    171115067    299632    0
  21. 7    159138663    259500    0
  22. 8    146364022    235990    0
  23. 9    141213431    230866    0
  24. MT    16569    1608344    0
  25. X    155270560    226806    0
  26. Y    59373566    11557    0

So I got rid of the first MT entry by extracting  a header by 

  • samtools view -HS input.sam > header.sam  

and deleted it by using an text editor.

Then

  • samtools reheader header.sam input.bam > new.bam

I made sure the first MT entry was gone in the header;

  1. SQ     SN:1    LN:249250621
  2. @SQ     SN:10   LN:135534747
  3. @SQ     SN:11   LN:135006516
  4. @SQ     SN:12   LN:133851895
  5. @SQ     SN:13   LN:115169878
  6. @SQ     SN:14   LN:107349540
  7. @SQ     SN:15   LN:102531392
  8. @SQ     SN:16   LN:90354753
  9. @SQ     SN:17   LN:81195210
  10. @SQ     SN:18   LN:78077248
  11. @SQ     SN:19   LN:59128983
  12. @SQ     SN:2    LN:243199373
  13. @SQ     SN:20   LN:63025520
  14. @SQ     SN:21   LN:48129895
  15. @SQ     SN:22   LN:51304566
  16. @SQ     SN:3    LN:198022430
  17. @SQ     SN:4    LN:191154276
  18. @SQ     SN:5    LN:180915260
  19. @SQ     SN:6    LN:171115067
  20. @SQ     SN:7    LN:159138663
  21. @SQ     SN:8    LN:146364022
  22. @SQ     SN:9    LN:141213431
  23. @SQ     SN:MT   LN:16569
  24. @SQ     SN:X    LN:155270560
  25. @SQ     SN:Y    LN:59373566

Ok Looking great but then when I further processed the new.BAM file by;

  • samtools sort new.bam
  • samtools index new.bam

Then I got an error message;

  • 51501 segmentation fault  samtools index Rsorted.bam

The samtools flagstat output looked fine;

  • 7837322 + 0 in total (QC-passed reads + QC-failed reads)
  • 0 + 0 duplicates
  • 7318383 + 0 mapped (93.38%:nan%)
  • 7837322 + 0 paired in sequencing
  • 3918661 + 0 read1
  • 3918661 + 0 read2
  • 6190627 + 0 properly paired (78.99%:nan%)
  • 7119872 + 0 with itself and mate mapped
  • 198511 + 0 singletons (2.53%:nan%)
  • 89782 + 0 with mate mapped to a different chr
  • 89781 + 0 with mate mapped to a different chr (mapQ>=5)

I am using samtools 0.1.19-44428cd

I do really appreciate any help on deleting the extra MT entry in the sam header without having this seg-fault.

Thanks in advance. 

 

 

 

 

rna-seq • 2.5k views
ADD COMMENTlink modified 17 months ago by Shicheng Guo4.7k • written 4.0 years ago by akhst740
1
gravatar for Devon Ryan
4.0 years ago by
Devon Ryan79k
Freiburg, Germany
Devon Ryan79k wrote:

You've unfortunately succeeded in changing every alignment in your BAM file such that they're now incorrect (go ahead and look at a few from the before and after and note that they all map to the same position, but on different chromosomes as before). One must be very careful when using the reheader command from samtools such that this doesn't happen. They simplest way that I know of to avoid this is to pipe the file in SAM to a small script which removes the first MT line and then pipes it back to samtools to remake the BAM file.

 

Edit: An example script would be something of the form: samtools view -h foo.bam | awk '{if(found==0) {if($2=="SN:MT") {found=1; getline;}} print $0}' | samtools view -bSo foo.reformated.bam -

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Devon Ryan79k

Hi Devon, 

Your script worked great !!  It was my understanding from a samtools man page that reheader appeared to be used to manipulate sam/bam headers without messing read counts and also when I looked at some discussion on the web which recommended for the use of reheader. 

Anyway thanks !!

ADD REPLYlink written 4.0 years ago by akhst740
2

The problem is that the chromosome that a read aligns to isn't stored in the read itself. Instead, an integer index (tid) is stored in the alignment and samtools (or any other program) looks in the header for that chromosome/contig associated with it. So, if you swapped the order of chr1 and chr2 in the header then all of the reads previously reported to be aligned to chr1 would then be aligned to chr2 (and vice versa). Similarly, by just replacing the header with one omitting a line, everything that previously aligned to the second listed chromosome (1 in your case) would then be aligned to what was previously the third chromosome (10 in your case). That's why there are usually warnings that the chromosomes/contigs should be in the same order when replacing a header.

This, BTW, is the benefit to making the change in SAM, since then the chromosome is stored with the alignment.

ADD REPLYlink written 4.0 years ago by Devon Ryan79k

So, what is a proper way(s) and instance(s) to use samtools reheader ?    

Thanks. 

ADD REPLYlink written 4.0 years ago by akhst740

If you want to change UCSC to Ensembl chromosome names then it's a convenient way to do so. Also, if you have a BAM file with alignments but no header (this is normally a programming error) then this is also useful.

ADD REPLYlink written 4.0 years ago by Devon Ryan79k

I should add that it's also handy if you want to add optional header fields, like the @PG or @CO lines. Otherwise, a better name for that command is probably "samtools renameContigs".

ADD REPLYlink written 4.0 years ago by Devon Ryan79k
1
gravatar for Pierre Lindenbaum
4.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum107k wrote:

why no just 'awk'-ing out the original sam ?

awk '/^@SQ\tSN\:MT/ {i++; if(i==1) next;} {print;}' < in.sam > out.sam

 

ADD COMMENTlink written 4.0 years ago by Pierre Lindenbaum107k
0
gravatar for Shicheng Guo
17 months ago by
Shicheng Guo4.7k
Shicheng Guo4.7k wrote:

Remove the formal header and then add a new header (only need fa and fa.fai) with:

samtools view -H SCNT.wnt5a_mouse.sort.bam > header.txt
samtools view SCNT.wnt5a_mouse.sort.bam | tail -n 300 > test.sam
samtools view -b -T ~/db/mm9/mm9.fa test.sam -o test.bam
samtools sort test.bam -o test.sort.bam
samtools index test.sort.bam
ADD COMMENTlink modified 17 months ago • written 17 months ago by Shicheng Guo4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour