Hello everyone,
I have a bed-filtered bam excluding some positions and some scaffolds of my original bams, but the header is not modified, and the length reported is the original one and not the one after the filtering:
How it looks:
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:lp23.s00001 LN:13188378
@SQ SN:lp23.s00002 LN:6085556
@SQ SN:lp23.s00003 LN:5556527 ...
How it should look:
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:lp23.s00001 LN:6188147
@SQ SN:lp23.s00002 LN:3292600
@SQ SN:lp23.s00003 LN:2810139 ...
I am using a downstream program that checks length values against header so it throws an error and stops (ANGSD). Thus I need it to be corrected. I have tried to create a new header: 1. Copying the @HD row, 2. replacing all the @SQ lines by the proper ones and, 3. adding the last part of information as it is; and reheader my bam using:
samtools reheader NewHeader.sam BamOldHeader.bam > BamNewHeader.bam
This run but produced a truncated file, that only contain the first scaffold.
samtools view BamNewHeader.bam | grep "lp23.s00002"
[main_samview] truncated file.
If I try to manually modify just the length of the first scaffold using sed and then pipe to reheader, it works fine. Also it works when I remove just one scaffold, save it and then do reheader. I cannot do that manually for every single scaffold as they are > 30000, and modifying everything using sed or awk and then pipe, although possible, is not that straightforward.
Is there any way of modifying the header after filtering removing the scaffolds that are not present and changing the lenght automatically? Is there anything that I am not taking into account while creating the new header?
Thanks,