Question: VCF input for KaryotypeR
0
gravatar for Rubal
8 months ago by
Rubal220
Germany
Rubal220 wrote:

Hello Everyone,

I am trying to use the R package KaryotypeR to produce rainfall plots in R. I am having trouble reading my VCF data into R and getting it to work in KaryotypeR. I've got their tutorial to work (link here): https://bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html

I have a VCF with example rows like this (excluding the many ##header lines for readability:

##FORMAT=<ID=FAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, forward strand">
##FORMAT=<ID=FCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, forward strand">
##FORMAT=<ID=FGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, forward strand">
##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
##FORMAT=<ID=RAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, reverse strand">
##FORMAT=<ID=RCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, reverse strand">
##FORMAT=<ID=RGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, reverse strand">
##FORMAT=<ID=RTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, reverse strand">
##FORMAT=<ID=PM,Number=1,Type=Float,Description="Proportion of mut allele">
##SAMPLE=<ID=NORMAL,Description="Normal",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001c,Source=.>
##SAMPLE=<ID=TUMOUR,Description="Tumour",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001b_lo0019,Source=.>
##FILTER=<ID=DTH,Description="Less than 1/3 mutant alleles were >= 25 base quality">
##INFO=<ID=CLPM,Number=1,Type=Float,Description="A soft flag median number of soft clipped bases in variant supporting reads">
##INFO=<ID=ASMD,Number=1,Type=Float,Description="A soft flag median alignement score of reads showing the variant allele">
##vcfProcessLog_20180918.1=<InputVCF=<AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz>,InputVCFSource=<FlagCaVEManVCF.pl>,InputVCFVer=<1.7.3>,InputVCFParam=<sp=.,umv=.,h=.,g=.,f=AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz,t=WGS,loud=.,n=AD0001c.bam,ref=genome.fa.fai,m=AD0001b_lo0019.bam,v=flag.to.vcf.convert.ini,s=TIGER,l=2000,ab=.,p=.,c=Tiger_flag.vcf.config.ini,b=.,idx=.,o=D0001b_lo0019_vs_AD0001c_flagged.vcf>>
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR
    contig01000014.1    1128    165247d6-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=89;MP=8.8e-01;GP=5.0e-11;TG=TT/CTTTT;TP=8.8e-01;SG=TT/TTTTT;SP=1.2e-01;CLPM=0.00;ASMD=136.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:11:0:0:1:27:0.0e+00   0|1:0:0:0:19:0:5:0:26:1.0e-01
    contig01000543.1    276 16524d8a-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=67;MP=8.7e-01;GP=1.3e-01;TG=TT/CTTTT;TP=8.7e-01;SG=CT/CTTTT;SP=1.2e-01;CLPM=0.00;ASMD=57.50  GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:3:0:1:0:12:6.2e-02    0|1:0:3:1:21:0:3:0:23:1.2e-01
    contig01000692.1    2433    16524ee8-b4df-11e8-bec4-89a28c540d42    A   T   .   PASS    DP=74;MP=9.8e-01;GP=2.0e-02;TG=AA/AAAAT;TP=8.8e-01;SG=AA/AAATT;SP=9.6e-02;CLPM=9.00;ASMD=101.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:19:0:0:3:25:0:0:0:6.4e-02   0|1:11:0:0:5:11:0:0:0:1.9e-01
    contig01000694.1    830 16525032-b4df-11e8-bec4-89a28c540d42    C   T   .   PASS    DP=114;MP=1.0e+00;GP=5.6e-14;TG=CC/CCCCT;TP=8.6e-01;SG=CC/CCCTT;SP=1.4e-01;CLPM=0.00;ASMD=111.00    GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:26:0:0:0:42:0:2:2.9e-02   0|1:0:8:0:1:0:26:0:9:2.3e-01
    contig01000694.1    872 16525172-b4df-11e8-bec4-89a28c540d42    G   T   .   PASS    DP=89;MP=1.0e+00;GP=2.9e-07;TG=GG/GGGTT;TP=7.4e-01;SG=GG/GGGGT;SP=2.4e-01;CLPM=0.00;ASMD=115.50 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:23:0:0:0:34:3:5.0e-02   0|1:0:0:9:0:0:0:11:9:3.1e-01

I would like to convert this into a GRanges format such that I can subset the 'TUMOUR' data (excluding the 'NORMAL') and plot the variants as a rainfall plot with the KaryotypeR package using these commands from the tutorial (where sm.gr is a GRanges object of the data):

library(karyoploteR)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr)

Could someone please point me towards a good way to read in and convert a VCF to a GRanges format suitable for this task?

Thanks for your help.

R • 499 views
ADD COMMENTlink modified 8 months ago by Bastien Hervé4.2k • written 8 months ago by Rubal220

however when I try to convert this to a GRanges object (which I believe is needed for karyotypeR, it does not work.

What is the exact error you're seeing? Can you give an example row from your vcf? That example data you have there doesn't look like a vcf

ADD REPLYlink written 8 months ago by Philipp Bayer6.1k

Thanks for the reply. The data I give is an alternative format encase that is easier to work with for karyotypeR. I've amended the answer with an example vcf row and example error

ADD REPLYlink written 8 months ago by Rubal220

Tips : When you get an error on a command line, don't go further in your analysis, you will fail

Tips2 : Always put command lines and error messages in your post

You got :

Error in .normargSEW0(start, "start") : 'start' must be a numeric vector (or NULL)

It speaks for itself, your start column is not numeric

Try to :

data$start = as.numeric(as.character(data$start))

And same for the end column I guess.

Could you had the command to generate the variable data please

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.2k

Thanks for the help but 'start' is a column of integer values in my table. I think I am inappropriately using the toGranges() function. I feel there must be a straightforward way to convert a vcf to a granges object suitable for karyotypeR and was just wondering if someone could point me to that.

ADD REPLYlink written 8 months ago by Rubal220

Please include the result of :

head -30 your_file.vcf

in your post

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.2k

substantially modified the post for readability and will do this thanks

ADD REPLYlink written 8 months ago by Rubal220

Please include the ##header too. A lot of tools to read vcf file use these header marks to check the vcf file. Upload your file somewhere (dropbox....) and add the link in your post is the best way to do it

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé4.2k

OK I just added some of it as there are >1000 lines for header due to a large number of contigs

ADD REPLYlink written 8 months ago by Rubal220

You deleted the interesting stuff with the error message and your command lines. Please add them

ADD REPLYlink written 8 months ago by Bastien Hervé4.2k
3
gravatar for Bastien Hervé
8 months ago by
Bastien Hervé4.2k
Limoges, CBRS, France
Bastien Hervé4.2k wrote:
library(karyoploteR)
tmp.vcf<-readLines("your_file.vcf")
tmp.vcf.data<-read.table("your_file.vcf")
tmp.vcf<-tmp.vcf[-(grep("#CHROM",tmp.vcf)+1):-(length(tmp.vcf))]
vcf.names<-unlist(strsplit(tmp.vcf[length(tmp.vcf)],"\t"))
names(tmp.vcf.data)<-vcf.names
colnames(tmp.vcf.data)[1] <- "Chr"
colnames(tmp.vcf.data)[2] <- "Start"
tmp.vcf.data = cbind(tmp.vcf.data,End=rep(tmp.vcf.data$Start+1))
tmp.vcf.data <- tmp.vcf.data[, c(1,2,12, 3:11)]
toGRanges(tmp.vcf.data)

Edit : Good luck to generate a custum genome based on your contigs in KaryoploteR !

ADD COMMENTlink modified 8 months ago • written 8 months ago by Bastien Hervé4.2k

Thanks the code. When I try with this commandL kpPlotRainfall(kp, data = tmp.vcf.data , col=variant.colors) I get the error: Error in attr(x, "tsp") <- c(1, NROW(x), 1) : attempt to set an attribute on NULL I suspect this is related to needing to create a correct contig base plot for adding this data to, so will explore that.

ADD REPLYlink written 8 months ago by Rubal220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 667 users visited in the last hour