Question: Bam file MC tag as integer
0
gravatar for chariko
4 months ago by
chariko30
Spain
chariko30 wrote:

Hi,
I downloaded a bam file and when running Picard ValidateSamFile option I got the following error:

ERROR ... ValidateSamFile Value for tag MC is not a String: class java.lang.Integer

When looking through the sam file I realized that the MC was as follows:

MC:i:1479433

When looking the Sam specifications for MC tag it stands the following:

| Tag | Type | Description |
| MC | Z | CIGAR string for mate/next segment |

where Type Z means to be a string.

So in my bam file, the problem is that there is an integer for the MC value instead of a string.

My question is, as I just have the bam file, is there a way to fix this "bad" value for MC tag?

Just it in case it helps you, the software used for generating the bam file was the following (extracted from the header)

ID:1#72
ID:SCS
ID:basecalling
ID:Illumina2bam
ID:bamadapterfind
ID:BamIndexDecoder
ID:spf
ID:bwa
ID:BamMerger
ID:SplitBamByReadGroup
ID:bamcollate2
ID:bamreset
ID:bamadapterclip
ID:bwa
ID:scramble
ID:bam12split
ID:bamsort
ID:AlignmentFilter
ID:bamsort
ID:bamstreamingmarkduplicates
ID:scramble.1

I wonder whether one of the softwares used for the analyis originated the "bad" value for the MC tag........

Thanks in advance for your advice.

sam validatesamfile bam mc • 303 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by chariko30
1
samtools view -h in.bam | sed '/^[^@]/s/\tMC\:i\:[0-9]*\t/\t/' | samtools view -o out.bam -

would do it,

but i can't know the consequences on your downstream analysis.

ADD REPLYlink written 4 months ago by Pierre Lindenbaum122k

Thank you very much for your answer.

This command makes my MC tag to dissapear and therefore now it does not give any errors when running PIcard, but as you said, I am not sure about the effect this could have. I will check it...

I will also try to know in which step of the analysis this value for MC was obtained.

When looking through the web about the possible reason for this strange value for the MC tag I found https://www.drive5.com/usearch/manual/cigar.html

A CIGAR standard was originally defined by the Exonerate alignment program, but this is not the same as the CIGARs found in SAM files. Several incompatible types of CIGAR string are used by different programs that support SAM files, and unfortunately CIGARs are not fully described by the SAM specification.

So, this may be what it happens in my case....

ADD REPLYlink modified 4 months ago • written 4 months ago by chariko30
2
gravatar for John Marshall
4 months ago by
John Marshall1.7k
Glasgow, Scotland
John Marshall1.7k wrote:

ID:bamstreamingmarkduplicates

Older versions of the biobambam tools used an integer MC:i tag to communicate duplicate marking statistics from one part of biobambam to another.

More recent versions use a more appropriate mc:i tag and strip out an integer MC if there is one, likely replacing it with a standard string MC:Z Mate CIGAR tag.

ADD COMMENTlink written 4 months ago by John Marshall1.7k

Thank you very much for your anwer @John.

You are right this could be an explanation about this MC:i values. In my case, the version of biobambam2 which was run was 2.0.8 which was released in April 2015

When checking issues of biobambam2 they fixed it in version 2.0.58

https://github.com/gt1/biobambam2/issues/34

https://github.com/gt1/biobambam2/issues/24

ADD REPLYlink written 4 months ago by chariko30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 466 users visited in the last hour