Errors when using samtools faidx and bgzip for indexing human reference genome
1
0
Entering edit mode
2.2 years ago
amy__ ▴ 160

Hello,

I know there are previous questions on this topic, but they have not helped my problem.

I am trying to use samtools faidx to index the human reference genome. I have downloaded the GRCh38 fasta file from here: https://www.ncbi.nlm.nih.gov/genome/guide/human/. I saved it as an .fa file.

When running

samtools faidx GRCh38_latest_genomic.fa 

I first get this error:

[E::fai_build3_core] Cannot index files compressed with gzip, please use bgzip
[faidx] Could not build fai index GRCh38_latest_genomic.fa.fai

When I do

bgzip GRCh38_latest_genomic.fa

I get this error:

[E::fai_build_core] Format error, unexpected character at line 1
[faidx] Could not build fai index GRCh38_latest_genomic.fa.gz.fai

I need to get a .fai file so that I can use with deepvariant.

Thanks, Amy

bgzip samtools • 2.9k views
ADD COMMENT
1
Entering edit mode

GRCh38_latest_genomic.fa is already compressed.

Just test this with

file GRCh38_latest_genomic.fa

ADD REPLY
3
Entering edit mode
2.2 years ago
ATpoint 82k

If you download it then it is a gzipped file already. So save it as *.fa.gz. Then use gzip -d to decompress, then recompress with bgzip.

ADD COMMENT
0
Entering edit mode

That worked great!! Thank you so much

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6