Question: Problems while reheadering BAM with Samtools 1.3.1
gravatar for ognjen011
14 months ago by
ognjen011150 wrote:

We want to remove the alternative contigs from both reference fasta and aligned BAM. The procedure we chose for BAM is to use Samtools View to keep only alignments chr1-22 and X,Y; we then: * extracted the header with samtools view -H, * deleted all the unused contigs * renamed the .txt file to .sam * used reheader BAM according to instructions.

The contents of the sam used for reheader is given below:

@HD VN:1.5  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@RG ID:1    PL:Illumina HiSeq   SM:Plasma
@PG ID:bwa  PN:bwa  CL:/opt/bwa-0.7.13/bwa mem -M -R @RG\tID:1\tPL:Illumina HiSeq\tSM:Plasma -t 8 Homo_sapiens_assembly38.fasta /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/ERR855951_SarcomacfDNA_1.fastq.gz /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/ERR855951_SarcomacfDNA_2.fastq.gz VN:0.7.13-r1126
@PG ID:SAMBLASTER   CL:samblaster -i /dev/stdin -o /dev/stdout -M   VN:0.1.22
@PG ID:GATK ApplyBQSR   VN:4.beta.2 CL:ApplyBQSR  --output ERR855951_SarcomacfDNA.recalibrated.bam --bqsr_recal_file /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/workspace/5ef73ca8-e545-4e5c-9b91-beb69b529d41/ctdna-align_WES___BWA___GATK_4_0__PLASMA_GATK_BaseRecalibrator/ERR855951_SarcomacfDNA.recal_data.grp --input /sbgenomics/Projects/8e3d501e-ff08-4ae4-b3ec-cc73e54a9b9a/workspace/5ef73ca8-e545-4e5c-9b91-beb69b529d41/ctdna-align_WES___BWA___GATK_4_0__PLASMA_BWA_MEM_Bundle_0_7_13/ERR855951_SarcomacfDNA.bam --createOutputBamIndex true  --preserve_qscores_less_than 6 --useOriginalQualities false --quantize_quals 0 --round_down_quantized false --emit_original_quals false --globalQScorePrior -1.0 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --disableToolDefaultReadFilters false PN:GATK ApplyBQSR

After this any tool using the BAM fails, for example Samtools reports a truncated file. The only way I can access it if I used Samtools in the header-only mode, where again I get the same header. What could be the problem here?

bam samtools header • 774 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by ognjen011150

Thanks for the answer. Is there a use-case where samtools reheader works correctly?

ADD REPLYlink written 14 months ago by ognjen011150

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment should have gone under @Devon's answer.

ADD REPLYlink written 14 months ago by genomax62k

Sure, changing things like "chr1" to "1", or removing chromosomes from the end (or adding chromosomes there). I consider samtools reheader to be a very advanced function that largely requires the user be familiar with how BAM files are internally structured.

ADD REPLYlink written 14 months ago by Devon Ryan88k

I am asking because we have tried removing chromosomes from the end, as far as I can tell. Your advice works, but I am trying to understand what went wrong.

ADD REPLYlink written 14 months ago by ognjen011150
gravatar for Devon Ryan
14 months ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Don't use samtools reheader to remove chromosomes/contigs that aren't at the very end of the header, doing so will just bork the file. What you want is instead something like:

samtools view -H foo.bam > header
# edit header...
samtools view foo.bam chr1 chr2 chr3 ... | samtools view -bo reheadered.bam -t header -

This will be slower than reheader, but actually work.

ADD COMMENTlink written 14 months ago by Devon Ryan88k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour