readVCF in PopGenome can't open VCF file
0
0
Entering edit mode
3.1 years ago

Hi there! I am trying to assess genetic diversity in a set of VCF files via the PopGenome R package. My VCF files are not annotated.

I have bgzipped and indexed (tabix) my VCF files with the iTabixit app, which compresses then indexes VCF files. I have the .gz and .tbi files in the same folder. But when I run the command that imports the VCF to R:

genome.class <- readVCF("SRR5341585/.vcf.gz",numcols=100, tid="?", from=1, to= 1000000, approx=TRUE, out="", parallel=FALSE, gffpath=FALSE)

I get the following error:

open: No such file or directory
Caught exception inside whop_tabix::open('SRR5341585/.vcf.gz'):
    'whop_tabix::open : Failed to open tabix index file'
return FALSE from whoptabix_open
vcff::open : could not open tabix-index!
VCF_open : Could not open file 'SRR5341585/.vcf.gz' as tabix-indexed!

I can confirm that I am in the correct directory, so I am not sure why this error is happening.

Any ideas on what the issue might be? Thanks!

RNA-Seq R vcf • 1.5k views
ADD COMMENT
0
Entering edit mode

It is unable to find a file name '.vcf.gz' in your SRR5341585/ directory. Can you try providing the complete name of the file?

genome.class <- readVCF("SRR5341585/filename.vcf.gz",numcols=100, tid="?", from=1, to= 1000000, approx=TRUE, out="", parallel=FALSE, gffpath=FALSE)
ADD REPLY
0
Entering edit mode

Hello! Thanks for the reply. So, that was the name of the file, but I realize now that the "/" in the file name may have been throwing the program off. So I renamed the file SRR5341585.vcf.gz, and the command was executed. But now I am getting this error:

Error in readVCF("SRR5341585.vcf.gz", numcols = 100, tid = "?", from
= 1,  :    Parameter <tid> invalid : not a chromosome/contig Identifier found in the VCF file!

The tid parameter is supposed to be used as a chromosomal identifier, but since my VCF files are not annotated I do not have this info. The PopGen manual said to use a random character string, hence why I used "?". Did not seem to work the way it was supposed to.

Thanks again.

ADD REPLY
0
Entering edit mode

I figured out the solution to this. So I have to give an identifier into the "tid" parameter, and since I have multiple VCF files that are of different populations, the only identifier I can use when looking at one VCF file is the gene identifier.

After putting a gene ID into the tid parameter I was able to load the VCF in. However, I wish there was a way to compare diversity stats across different populations (stored in multiple VCF files). Is there a way to do this?

ADD REPLY

Login before adding your answer.

Traffic: 3176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6