Bam file MC tag as integer
1
0
Entering edit mode
5.1 years ago
chariko ▴ 60

Hi,
I downloaded a bam file and when running Picard ValidateSamFile option I got the following error:

ERROR ... ValidateSamFile Value for tag MC is not a String: class java.lang.Integer

When looking through the sam file I realized that the MC was as follows:

MC:i:1479433

When looking the Sam specifications for MC tag it stands the following:

| Tag | Type | Description |
| MC | Z | CIGAR string for mate/next segment |

where Type Z means to be a string.

So in my bam file, the problem is that there is an integer for the MC value instead of a string.

My question is, as I just have the bam file, is there a way to fix this "bad" value for MC tag?

Just it in case it helps you, the software used for generating the bam file was the following (extracted from the header)

ID:1#72
ID:SCS
ID:basecalling
ID:Illumina2bam
ID:bamadapterfind
ID:BamIndexDecoder
ID:spf
ID:bwa
ID:BamMerger
ID:SplitBamByReadGroup
ID:bamcollate2
ID:bamreset
ID:bamadapterclip
ID:bwa
ID:scramble
ID:bam12split
ID:bamsort
ID:AlignmentFilter
ID:bamsort
ID:bamstreamingmarkduplicates
ID:scramble.1

I wonder whether one of the softwares used for the analyis originated the "bad" value for the MC tag........

Thanks in advance for your advice.

ValidateSamFile bam sam MC • 3.2k views
ADD COMMENT
1
Entering edit mode
samtools view -h in.bam | sed '/^[^@]/s/\tMC\:i\:[0-9]*\t/\t/' | samtools view -o out.bam -

would do it,

but i can't know the consequences on your downstream analysis.

ADD REPLY
0
Entering edit mode

Thank you very much for your answer.

This command makes my MC tag to dissapear and therefore now it does not give any errors when running PIcard, but as you said, I am not sure about the effect this could have. I will check it...

I will also try to know in which step of the analysis this value for MC was obtained.

When looking through the web about the possible reason for this strange value for the MC tag I found https://www.drive5.com/usearch/manual/cigar.html

A CIGAR standard was originally defined by the Exonerate alignment program, but this is not the same as the CIGARs found in SAM files. Several incompatible types of CIGAR string are used by different programs that support SAM files, and unfortunately CIGARs are not fully described by the SAM specification.

So, this may be what it happens in my case....

ADD REPLY
2
Entering edit mode
5.1 years ago

ID:bamstreamingmarkduplicates

Older versions of the biobambam tools used an integer MC:i tag to communicate duplicate marking statistics from one part of biobambam to another.

More recent versions use a more appropriate mc:i tag and strip out an integer MC if there is one, likely replacing it with a standard string MC:Z Mate CIGAR tag.

ADD COMMENT
0
Entering edit mode

Thank you very much for your anwer @John.

You are right this could be an explanation about this MC:i values. In my case, the version of biobambam2 which was run was 2.0.8 which was released in April 2015

When checking issues of biobambam2 they fixed it in version 2.0.58

https://github.com/gt1/biobambam2/issues/34

https://github.com/gt1/biobambam2/issues/24

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6