How can I simply view the .vcf.gz file to get some basic information
3
0
Entering edit mode
2.1 years ago
nreid • 0

I'm trying to extract snps (list of RSID's and positions from gnomAD) from a series of .vcf.gz files for analysis, but im not entirely sure where to begin. The readme for the files state that the .vcf.gz files do not contain rsid's which makes this the first step I suspect I need to complete. Included with all said .vcf.gz files are.snpinfo files which I am only 50% certain contain relevant information. I am aware of vcftools annotation feature, but I need to first explore the datasets a bit. Is hail good for this? I am quite new to this space, so pardon the simplicity of my questions. Also: if this has been explained elsewhere please point me to the right spot or proper search terms, ive done a fair bit already but couldnt find much at this level.

Thank you.

SNP genome • 13k views
3
Entering edit mode
2.1 years ago
Smandape ▴ 110

Adding to above answers, another way is to use unix basic commands such as zcat, cut, grep to browse through few lines of the file.

zcat samplefile.vcf.gz | more


You can also pipe it to cut if you want to look at few columns (for example, just looking at first 8 columns)

zcat samplefile.vcf.gz | cut -f-8

2
Entering edit mode
2.1 years ago

Not sure what you mean by "simply view" the vcf file

The VCF file is in text format (once you unzip it) and it may be read by eye - though this exercise needs a little training as the format is quite complicated.

https://samtools.github.io/hts-specs/VCFv4.2.pdf

If you would like to use a graphical interface you may use IGV (and use the same genome that the VCF file was created against) to graphically visualize the file.

If you wish to transform the VCF file into simpler data, perhaps tab delimited columns that only contain the information that interests you then bcftools are the way to go:

http://www.htslib.org/doc/bcftools.html

1
Entering edit mode
2.1 years ago
JC 13k

To view the file you can use simply zmore command, VCF is just a text table and compressed with Gzip/Bgzip, also tabix can help you to extract some positions.

To annotate your VCFs check the Variant Effector Predictor

0
Entering edit mode

Hi, JC! I'm working on VCF.gz files trying to extract variations associated with a certain genomic region. I was wondering if I can extract it by rows in a human readable format that can be filtered further (without actually unzipping the VCF.gz). Or just extracting the rows I want by genomic region and then filter it further by ID in the linux command line.