Question

How can I simply view the .vcf.gz file to get some basic information

1

Entering edit mode

4.9 years ago

nreid ▴ 10

I'm trying to extract snps (list of RSID's and positions from gnomAD) from a series of .vcf.gz files for analysis, but im not entirely sure where to begin. The readme for the files state that the .vcf.gz files do not contain rsid's which makes this the first step I suspect I need to complete. Included with all said .vcf.gz files are.snpinfo files which I am only 50% certain contain relevant information. I am aware of vcftools annotation feature, but I need to first explore the datasets a bit. Is hail good for this? I am quite new to this space, so pardon the simplicity of my questions. Also: if this has been explained elsewhere please point me to the right spot or proper search terms, ive done a fair bit already but couldnt find much at this level.

Thank you.

SNP genome • 43k views

ADD COMMENT • link updated 12 months ago by venkat ▴ 10 • written 4.9 years ago by nreid ▴ 10

score 4 · Answer 1 · 2020-08-13

Adding to above answers, another way is to use unix basic commands such as zcat, cut, grep to browse through few lines of the file.

zcat samplefile.vcf.gz | more

You can also pipe it to cut if you want to look at few columns (for example, just looking at first 8 columns)

zcat samplefile.vcf.gz | cut -f-8

score 2 · Answer 2 · 2020-08-11

Not sure what you mean by "simply view" the vcf file

The VCF file is in text format (once you unzip it) and it may be read by eye - though this exercise needs a little training as the format is quite complicated.

https://samtools.github.io/hts-specs/VCFv4.2.pdf

If you would like to use a graphical interface you may use IGV (and use the same genome that the VCF file was created against) to graphically visualize the file.

If you wish to transform the VCF file into simpler data, perhaps tab delimited columns that only contain the information that interests you then bcftools are the way to go:

http://www.htslib.org/doc/bcftools.html

score 1 · Answer 3 · 2020-08-11

1

Entering edit mode

4.9 years ago

JC 13k

To view the file you can use simply zmore command, VCF is just a text table and compressed with Gzip/Bgzip, also tabix can help you to extract some positions.

To annotate your VCFs check the Variant Effector Predictor

ADD COMMENT • link 4.9 years ago by JC 13k

0

Entering edit mode

Hi, JC! I'm working on VCF.gz files trying to extract variations associated with a certain genomic region. I was wondering if I can extract it by rows in a human readable format that can be filtered further (without actually unzipping the VCF.gz). Or just extracting the rows I want by genomic region and then filter it further by ID in the linux command line.

ADD REPLY • link 4.2 years ago by Sammy ▴ 30

score 0 · Answer 4 · 2024-07-11

0

Entering edit mode

12 months ago

venkat ▴ 10

You can check out SCI-VCF. It accepts both .vcf and .vcf.gz file formats. It is user-friendly and can be used to summarise, inspect, explore and visualise VCF files without any programming.

Here are some helpful links: SCI-VCF | Documentation

ADD COMMENT • link 12 months ago by venkat ▴ 10