Failed to populate reference for id 1?
2
0
Entering edit mode
5.4 years ago
cmdcolin ★ 3.8k

I have a small snippet of BAM that I want to convert to CRAM.

% samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

% samtools view -T chr1.fa -C hg19_chr1.bam -o hg19_chr1.cram
[E::cram_get_ref] Failed to populate reference for id 1
[main_samview] failed to write the SAM header
%

I have looked at other errors like this on google but couldn't really see why this simple scenario would fail. I tried also using a different version of chr1 and it gives an error about length mismatching, so this seems to be a different error.

samtools • 5.5k views
ADD COMMENT
1
Entering edit mode

Hello cmdcolin ,

I guess in your reference sequence file the chromosome name is prefixed with chr, but in your bam file it isn't.

$ samtools view -H hg19_chr1.bam|grep "SN" will give you the sequence name used in your bam file.

grep "^>" chr1.fa will give you the sequence name in your reference.

fin swimmer

ADD REPLY
0
Entering edit mode

I think they are the same actually, in BAM header @SQ SN:chr1 LN:249250621 and in FASTA >chr1

ADD REPLY
2
Entering edit mode
5.4 years ago

Ok, I took a closer look at this.

In your reference file must be all sequence names listed in the header of the bam file, regardless whether you have mapped reads to all of these region or not. Saying this you have two options:

  1. provide a reference file that contains all sequence names given in the header of the bamfile
  2. remove the unused sequence names from the header

fin swimmer

ADD COMMENT
0
Entering edit mode

Thank you very much for this information, that worked! Do you happen to have a source for this info? I just didn't find anything in my searches

ADD REPLY
1
Entering edit mode

Hello again,

no I haven't found any source for this information. It was more an interpretation of the error message. After it was clear, that the problem with the chr prefix isn't the reason. I translated the message to "I cannot find a given sequence name in the reference". I could than confirm this by creating a subset of a bam file for just one chromosome and providing a reference with just that one chromosome.

fin swimmer

ADD REPLY
0
Entering edit mode
5.4 years ago
cmdcolin ★ 3.8k

Following advice from @finswimmer, I modified the BAM header and deleted all other chromosomes except chr1 SN lines using something like this

samtools view -H hg19_chr1.bam > header.txt
manually modify header.txt to remove other SN lines except chr1
samtools reheader header.txt hg19_chr1.bam > hg19_chr1_mod.bam
samtools view -T chr1.fa -C hg19_chr1_mod.bam -o hg19_chr1_mod.cram
ADD COMMENT

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6