Question: Error using Tabix
0
gravatar for AS
10 months ago by
AS0
AS0 wrote:

Hi, I am trying to use vcf files downloaded from public database (NCBI SRA Bioproject:PRJNA486011). I am using only R to do that because I don’t know any other language. I have put the gz and the gz.tbi in the same folder.

>vcf.fl <- list.files("path/to/vcfs", full.names=TRUE, pattern=".vcf.gz$")
> tabx <- TabixFile("chr4_eva.vcf.gz", index=paste("chr4_eva.vcf.gz", "tbi", sep="."),yieldSize=10000)
> tabx
class: TabixFile 
path: chr4_eva.vcf.gz
index: chr4_eva.vcf.gz.tbi
isOpen: FALSE 
yieldSize: 10000 

> names(tabx)

 [1] ".self"        ".->path"      ".extptr"      ".->yieldSize" ".refClassDef" ".->index"     ".->.extptr"   "index"       
 [9] "path"         "yieldSize"   

> vcf.rng <- readVcf(tabx)

Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
tabix R vcf • 455 views
ADD COMMENTlink modified 10 months ago by dariober10k • written 10 months ago by AS0
1

Probably one file has notations ike 1,2,3...X,Y and your has chr1,chr2... Check that.

ADD REPLYlink written 10 months ago by ATpoint26k

I succeed to check the data of the .vcf.gz but I haven't found a way to look inside the tbi. Could you please indicate me a way to do it ?

ADD REPLYlink written 10 months ago by AS0

You have to compare your VCF against the VCF you downloaded, not the index. Please show a subset of your VCF and of the downloaded VCFs.

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint26k

Why don't you index them quickly by yourself - this is very simple:

tabix -p vcf <file.name>
ADD REPLYlink written 10 months ago by JJ470

Thanks for the answers.

I have downloaded the vcf from : ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz. The size of the download file is ok compare to the size indicated on the website.

I have tried to look at the file and I found strange that the ID seem full of NA and no samples are detected.

VCFA <- read.vcfR("chr8_eva.vcf.gz")
VCFA
***** Object of Class vcfR *****
0 samples
1 CHROMs
1,294,347 variants
Object size: 294.3 Mb
NaN percent missing data
*****        *****         *****

head(VCFA)

[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##GATKCommandLine.SelectVariants=<ID=SelectVariants,CommandLineOptio [Truncated]"
[1] "##source=SelectVariants"
[1] "##FILTER=<ID=LowQual,Description=\"Low quality\">"
[1] "##FILTER=<ID=my_snp_filter,Description=\"QD < 2.0 || FS > 60.0 || MQ < 30.0\">"
[1] "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths fo [Truncated]"
[1] "First 6 rows."
[1] 
[1] "***** Fixed section *****"
     CHROM POS  ID REF ALT QUAL    FILTER         
[1,] "8"   "6"  NA "A" "G" "46.53" NA             
[2,] "8"   "13" NA "A" "G" "92.17" NA             
[3,] "8"   "14" NA "A" "C" "29.71" "my_snp_filter"
[4,] "8"   "18" NA "T" "C" "71.66" NA             
[5,] "8"   "20" NA "A" "G" "55.56" NA             
[6,] "8"   "27" NA "A" "G" "55.15" NA             
[1] 
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1] 
[1] "Unique GT formats:"
[1] "No gt slot present"
[1]
ADD REPLYlink modified 10 months ago by finswimmer13k • written 10 months ago by AS0
0
gravatar for dariober
10 months ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

The vcf file you got from ftp://ftp.sra.ebi.ac.uk/vol1/ERZ696/ERZ696780/chr8_eva.vcf.gz seems defective since its header does not contain the contig "8". As a quick and dirty fix you can add the missing contig with something like:

gunzip -c chr8_eva.vcf.gz | grep '^##' > chr8_eva.fix.vcf
echo "##contig=<ID=8>" >> chr8_eva.fix.vcf 
gunzip -c chr8_eva.vcf.gz | grep -v '^##' >> chr8_eva.fix.vcf
bgzip chr8_eva.fix.vcf

Now it should work:

VCFA <- readVcf("~/Downloads/chr8_eva.fix.vcf.gz") ## Seems ok

# Original vcf throws warning:
VCFA <- readVcf("~/Downloads/chr8_eva.vcf.gz")
Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
ADD COMMENTlink written 10 months ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1948 users visited in the last hour