Hi there I recently start working with some archaic samples (aDNA), specifically two high coverage samples of Neanderthal and Denisova.
The latter has on single BAM file aligned to GRCh37 and already sorted; unfortunately, Neanderthal has five BAMs which were not sorted and needed to be merged...
So, I first proceeded to sort them — with -n
to have the files ordered by read name — and then attempt to merge them with samtools
; however, I was getting prompted the following:
[bam_translate] RG tag "L9302" on read "SN7001204_0130_AC0M6HACXX_PEdi_SS_L9302_L9303_1:1:1101:1050:3313" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9105" on read "NIOBE_0139_A_D0B5GACXX:6:1101:1227:3642" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9198" on read "SN928_0068_BB022WACXX:1:1101:1094:3181" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9199" on read "SN928_0073_BD0J78ACXX:1:1101:1249:195244" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9303" on read "SN7001204_0130_AC0M6HACXX_PEdi_SS_L9302_L9303_1:1:1101:1050:9131" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
After a bit of research, I though the problem could have been that sorting and merging alone would have caused this; hence, following one post I thought to re-head the five files with the following:
samtools view -H nea_<#>.bam | grep -v "^@RG" | samtools reheader - nea_<#>.bam > nea_<#>_reheaded.bam
Despite doing so, though, the problem persist and as it is not an operation I do routinely (let alone on aDNA) I really cannot pinpoint to what is the source of it... Also, is it something concerning since the next step will be extracting the R1 & R2 from this BAM. Let me know, thanks in advance!
RG tag "L9302" is missing the the BAM header. You should have something like this in the BAM header
may be you can add this with sam reheader
@Pierre Lindenbaum I see. So, I can actually manipulate just one single RG tag at the time with
samtools
? Which flag should I use, or should I just get the header from the BAM file add the tag and set it as the new header? Let me know, thanks!