Question: error while using " samtools mpileup " function
0
gravatar for ricfoz
19 months ago by
ricfoz30
National School of Antropology and History, Mexico city, Mexico
ricfoz30 wrote:

Hello everyone,

I am trying to run a workflow from a complete genome .bam file all the way to the consensus of a fasta sequence from the .vcf file constructed through the process.

After sorting, indexing, and retreiving a .bam file with the genic strand i wish to work with, i need to run

samtools mpileup

As i run the following command:

samtools mpileup -g -f MyFile.bam > MyFile.bcf

I get the following error:

[group_smpl] Read group SRC-5-alex4__MA167snp used in file MyFile.bam but absent from the header or an alignment missing read group.

Anyone who could help me figure out how to output a .bcf file successfuly?, I have run this command on docens of other genomes, and it's the first time i get an error. Any help would be much appreciated.

ADD COMMENTlink modified 19 months ago by Devon Ryan91k • written 19 months ago by ricfoz30
0
gravatar for Devon Ryan
19 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

This is a bit hard to explain without first noting how BAM files work internally. In a BAM file, the header holds all of the read groups in a list. Each read, then, can have an RG auxiliary tag with a number assigned to it. This number is then an index into the read group list. In this case, you have an alignment with an RG auxiliary tag, but the number assigned to it is outside the bounds of what's in the header. Presumably the file was preprocessed at some point and a read group line got stripped from the header. What might work would be to reheader the file:

samtools view -H MyFile.bam > header

Then add a read group to the header and:

samtools reheader header MyFile.bam > MyFile.reheadered.bam

As long as there's only one missing read group then this will solve the immediate problem. Of course this really begs the question of how this situation arose in the first place. I don't have an answer to that, though I presume that some preprocessing step went awry.

ADD COMMENTlink written 19 months ago by Devon Ryan91k

Hey there Devon.

That makes sense (of course, since you know what you are talking about), i visually checked my used bam files, and the problem is after the alignment, say, all of the ones who work well have 10 information columns after the alignment section, these columns are the next ones in a troubleless file:

x0:i:1 x1:i:0 MD:z:XXXX RG:z:XXXX XG:i:0 NM:i:0 XM:i:0 X0:i0 XP:i:1 XT:A:U

As i checked the file with the trouble, i realized it is lacking the colum with the "XP:i:1" tag. As you said, it is strange, since i downloaded the file directly from the source, and i just sorted, indexed, and retrieved a strand, i really don't think the problem was with my processing protocol, but it is still a good question to have in mind, on which preprocessing step this happened.

I'll run the code you suggest, i just have one question, do the reheader option fixes automatically the lacking information in the bam file?, or do i somehow have to add the column manually? .. if so, i am thinking something with

awk

or maybe

sed

, which would be a bit more challenging for me.

Thank you for the feedback

ADD REPLYlink modified 19 months ago • written 19 months ago by ricfoz30

If my guess is correct then the error is just a missing line in the header. You'll have to manually add a line into the header in a text editor, so no awk or sed needed. This is just my expectation as to the fix though. If this is correct, then the reheadering process should resolve all issues.

ADD REPLYlink written 19 months ago by Devon Ryan91k

Hey, i have tried the code you provide, and i see what you mean .. in the middle of those commands, i need to add in the header the information "SRC-5-alex4__MA167snp" into the header, before reheadering the file, so all the read groups have the tag and can be piled up by the "mpileup" command.

Any suggestion on a command to add that information into the header? .. i have been browsing an some people suggest using " picard ", with the "AddOrReplaceReadGroups" option. Is that correct?, it is the first time i will be formating a .bam header and i wish to be sure that my analysis will be correct after adding info to the file's header.

cheers,

Ricardo

ADD REPLYlink modified 19 months ago • written 19 months ago by ricfoz30

Open it with your preferred text editor and type the line in by hand. That's the quickest and easiest method.

ADD REPLYlink written 19 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour