[E::faidx_fetch_seq] error with Samtools mpileup?
1
0
Entering edit mode
4.4 years ago
mwerseb1 • 0

Hello All,

I am trying to call SNPs using samtools mpileup command and I continue to get this error:

[E::faidx_fetch_seq] The sequence "NC_000913.3 Escherichia coli str. K-12 substr. MG1655" not found

I have tried several of the suggested solutions elsewhere on the internet to no avail. Any help would be greatly appreciated.

Some background:

I generated sam files with bbmap then coverted to bam files with samtools view. There were then sorted using samtools sort.

The fasta file was indexed with samtools faidx.

current syntax is:

path-to-program/samtools mpileup -v -f ASM584v2.fna -I Sorted.ERR435575.bam > ERR435575.vcf
SNP • 2.2k views
ADD COMMENT
0
Entering edit mode

Please show output of:

SEQ="NC_000913.3 Escherichia coli str. K-12 substr. MG1655"
grep $SEQ <(samtools view -H Sorted.ERR435575.bam)
grep $SEQ ASM584v2.fna

meaning to check if this sequene name is present in both the fasta reference and the SAM header.

ADD REPLY
0
Entering edit mode

grep "NC_000913.3 Escherichia coli str. K-12 substr. MG1655" ASM584v2.fna

>NC_000913.3 Escherichia coli str. K-12 substr. MG1655

/path-to-/samtools view -H Sorted.ERR435575.bam

@HD     VN:1.4  SO:coordinate

@SQ     SN:NC_000913.3 Escherichia coli str. K-12 substr. MG1655        LN:4641652

@PG     ID:BBMap        PN:BBMap        VN:38.73
ADD REPLY
1
Entering edit mode
4.4 years ago

Also look for 913.3 in the fai file:

grep 913.3 ASM584v2.fna.fai

As described in the documentation, samtools faidx considers the first word on the > lines to be the names of the sequences.

Most aligners do the same thing, and it is surprising that bbmap has not and has instead put the whole NC_000913.3 Escherichia coli str. K-12 substr. MG1655 name and description into the @SQ SN BAM header.

If bbmap does not have a way to set it to only use the first word for the sequence name, you should edit ASM584v2.fna and either change the spaces to (say) underscores or more conveniently hack off the Escherichia coli str. K-12 substr. MG1655 from the > line so that bbmap and faidx will agree on what the reference sequence names are — simply NC_000913.3. Either way, you will have to regenerate the bbmap index (if any) and remap.

(Or you could do a search and replace on the SAM file to avoid remapping.)

ADD COMMENT
0
Entering edit mode

BBMap is flexible in allowing headers to go through as is. It has settings to truncate the read description at first space by using trd=t option when one aligns. Danger with that is if the first word in fasta headers is not unique then one would end up with an unusable alignment.

It is possible to truncate the descriptions in aligned SAM/BAM files without a need to realign.

reformat.sh in=aligned.bam out=new.bam trd=t
ADD REPLY

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6