bam merging for archaic samples
0
0
Entering edit mode
5 weeks ago
Matteo Ungaro ▴ 100

Hi there I recently start working with some archaic samples (aDNA), specifically two high coverage samples of Neanderthal and Denisova.

The latter has on single BAM file aligned to GRCh37 and already sorted; unfortunately, Neanderthal has five BAMs which were not sorted and needed to be merged...

So, I first proceeded to sort them — with -n to have the files ordered by read name — and then attempt to merge them with samtools; however, I was getting prompted the following:

[bam_translate] RG tag "L9302" on read "SN7001204_0130_AC0M6HACXX_PEdi_SS_L9302_L9303_1:1:1101:1050:3313" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9105" on read "NIOBE_0139_A_D0B5GACXX:6:1101:1227:3642" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9198" on read "SN928_0068_BB022WACXX:1:1101:1094:3181" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9199" on read "SN928_0073_BD0J78ACXX:1:1101:1249:195244" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] RG tag "L9303" on read "SN7001204_0130_AC0M6HACXX_PEdi_SS_L9302_L9303_1:1:1101:1050:9131" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.

After a bit of research, I though the problem could have been that sorting and merging alone would have caused this; hence, following one post I thought to re-head the five files with the following:

samtools view -H nea_<#>.bam | grep -v "^@RG" | samtools reheader - nea_<#>.bam > nea_<#>_reheaded.bam

Despite doing so, though, the problem persist and as it is not an operation I do routinely (let alone on aDNA) I really cannot pinpoint to what is the source of it... Also, is it something concerning since the next step will be extracting the R1 & R2 from this BAM. Let me know, thanks in advance!

samtools bam • 337 views
ADD COMMENT
0
Entering edit mode

RG tag "L9302" is missing the the BAM header. You should have something like this in the BAM header

@RG ID:L9302    SM:L9302

may be you can add this with sam reheader

ADD REPLY
0
Entering edit mode

@Pierre Lindenbaum I see. So, I can actually manipulate just one single RG tag at the time with samtools? Which flag should I use, or should I just get the header from the BAM file add the tag and set it as the new header? Let me know, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6