Question: bcf tools fixref error
0
gravatar for kstafford32
7 days ago by
kstafford320 wrote:

Hello, I am trying to run the fixref plugin found here, to correct for snpflip errors in my topmed imputation. I found this code here: https://samtools.github.io/bcftools/howtos/plugin.fixref.html

for i in {1..22}
do
bcftools norm --check-ref e -f $OUTDIR/DAC14_send_to_topmed/Homo_sapiens_assembly38_withchrfa.fa $OUTDIR/DAC14_send_to_topmed/DAC14_chr$i\_hg38_nonduplicates.vcf.gz -Ou -o /dev/null

As my build is hg38 and I need to keep the chr prefix in my reference file, I decided to use the GATK HG38 Build called: Homo_sapiens_assembly38.fasta found here: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

I keep receiving this error message:

Failed to load the fai index: /sc/arion/projects/psychgen2/MAP2_dac/data/imputation/DAC14_send_to_topmed/Homo_sapiens_assembly38_withchr.fasta [E::fai_build_core] Format error, unexpected "<" at line 2

I cannot seem to find the solution to this error.

ADD COMMENTlink modified 7 days ago • written 7 days ago by kstafford320
0
gravatar for Pierre Lindenbaum
7 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

The message is quite clear to me. Your fasta sequence is NOT a fasta sequence.

Format error, unexpected "<" at line 2

Show us the output of

head -n 3 /sc/arion/projects/psychgen2/MAP2_dac/data/imputation/DAC14_send_to_topmed/Homo_sapiens_assembly38_withchr.fasta 
ADD COMMENTlink written 7 days ago by Pierre Lindenbaum134k

interesting.... perhaps I downloaded it incorrectly?

 head -n 3 /sc/arion/projects/psychgen2/MAP2_dac/data/imputation/DAC14_send_to_topmed/Homo_sapiens_assembly38_withchr.fasta

<!DOCTYPE html>
<html lang="en">

I used this to download...perhaps cannot do this from a cloud?

wget https://storage.cloud.google.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta?_ga=2.192584233.-289538108.1613068749

Also, I used this as I simply need a reference genome for running the fixref plugin that uses hg38 and chr prefix. Supposedly this is the main one available from GATK? In this same cloud, this was the only available fasta file...others were vcf.gz which couldn't work for this fixref script.

ADD REPLYlink written 7 days ago by kstafford320
1

you downloaded the web page...

from cloud.google.com , I think you need to download it from the browser.

ADD REPLYlink written 7 days ago by Pierre Lindenbaum134k

Yes, you're right.

So that any other rookies don't make this mistake, use:

wget https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta

^ This one worked for some reason...

ADD REPLYlink written 7 days ago by kstafford320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2461 users visited in the last hour
_