1
1
Entering edit mode
2.0 years ago
j.lunger18 ▴ 30

Hi all, new to bioinformatics and have what I think is a pretty basic question. I'm attempting to use hail to import a vcf file from the gnomad website (https://gnomad.broadinstitute.org/downloads). I've successfully created a python session in the terminal and loaded hail, now just trying to use import_vcf(), which should work by just taking one argument, the path to the file.

Here is my warning:

2019-09-20 15:24:25 Hail: WARN: /gnomad-public/release/2.1/vcf/exomes/gnomad.exomes.r2.1.sites.chr1.vcf.bgz' refers to no files

I imagine my problem is just that I don't have the correct path, but I wasn't quite sure what that would be...

vcf gnomad path • 1.6k views
0
Entering edit mode

You should also know that the gnomAD team has files available already in Hail native format, from that same link. These will be much more usable, since the VCF format is much less flexible. It's much harder to get useful data out of the VEP consequence field in VCF form, for instance.

0
Entering edit mode

Also, if you're running this on Google Cloud Dataproc, the bucket identifier should start with gs://: gs://gnomad-public/...

0
Entering edit mode

Hi, I'm brand new to gnomad, vcf's, and cloud buckets. If I want to use the Hail files like you mentioned, I would have to use google storage and have an account with them, right? I am trying to download the files directly and use hail afterwards because I don't have a google storage subscription

0
Entering edit mode
2.0 years ago
Brice Sarver ★ 3.7k

Specifying a file path starting with / is relative to the root directory, and it probably isn't want you want. A . refers to the current working directory, so there's a good chance ./gnomad-public/...` would work.

This is bioinformatics-adjacent as opposed to being a pure bioinformatics question, so you may have better luck with these questions on StackOverflow in the future.