Question: Failed to populate reference for id 1?
0
gravatar for cmdcolin
4 months ago by
cmdcolin1.2k
United States
cmdcolin1.2k wrote:

I have a small snippet of BAM that I want to convert to CRAM.

% samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

% samtools view -T chr1.fa -C hg19_chr1.bam -o hg19_chr1.cram
[E::cram_get_ref] Failed to populate reference for id 1
[main_samview] failed to write the SAM header
%

I have looked at other errors like this on google but couldn't really see why this simple scenario would fail. I tried also using a different version of chr1 and it gives an error about length mismatching, so this seems to be a different error.

samtools • 403 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by cmdcolin1.2k
1

Hello cmdcolin ,

I guess in your reference sequence file the chromosome name is prefixed with chr, but in your bam file it isn't.

$ samtools view -H hg19_chr1.bam|grep "SN" will give you the sequence name used in your bam file.

grep "^>" chr1.fa will give you the sequence name in your reference.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer11k

I think they are the same actually, in BAM header @SQ SN:chr1 LN:249250621 and in FASTA >chr1

ADD REPLYlink written 4 months ago by cmdcolin1.2k
2
gravatar for finswimmer
4 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Ok, I took a closer look at this.

In your reference file must be all sequence names listed in the header of the bam file, regardless whether you have mapped reads to all of these region or not. Saying this you have two options:

  1. provide a reference file that contains all sequence names given in the header of the bamfile
  2. remove the unused sequence names from the header

fin swimmer

ADD COMMENTlink written 4 months ago by finswimmer11k

Thank you very much for this information, that worked! Do you happen to have a source for this info? I just didn't find anything in my searches

ADD REPLYlink written 4 months ago by cmdcolin1.2k
1

Hello again,

no I haven't found any source for this information. It was more an interpretation of the error message. After it was clear, that the problem with the chr prefix isn't the reason. I translated the message to "I cannot find a given sequence name in the reference". I could than confirm this by creating a subset of a bam file for just one chromosome and providing a reference with just that one chromosome.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer11k
0
gravatar for cmdcolin
4 months ago by
cmdcolin1.2k
United States
cmdcolin1.2k wrote:

Following advice from @finswimmer, I modified the BAM header and deleted all other chromosomes except chr1 SN lines using something like this

samtools view -H hg19_chr1.bam > header.txt
manually modify header.txt to remove other SN lines except chr1
samtools reheader header.txt hg19_chr1.bam > hg19_chr1_mod.bam
samtools view -T chr1.fa -C hg19_chr1_mod.bam -o hg19_chr1_mod.cram
ADD COMMENTlink written 4 months ago by cmdcolin1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour