Question: VCF input for KaryotypeR
0
gravatar for Rubal
6 days ago by
Rubal200
Germany
Rubal200 wrote:

Hello Everyone,

I am trying to use the R package KaryotypeR to produce rainfall plots in R. I am having trouble reading my VCF data into R and getting it to work in KaryotypeR. I've got their tutorial to work (link here): https://bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html

I have a VCF with example rows like this (excluding the many ##header lines for readability:

##FORMAT=<ID=FAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, forward strand">
##FORMAT=<ID=FCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, forward strand">
##FORMAT=<ID=FGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, forward strand">
##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
##FORMAT=<ID=RAZ,Number=1,Type=Integer,Description="Reads presenting a A for this position, reverse strand">
##FORMAT=<ID=RCZ,Number=1,Type=Integer,Description="Reads presenting a C for this position, reverse strand">
##FORMAT=<ID=RGZ,Number=1,Type=Integer,Description="Reads presenting a G for this position, reverse strand">
##FORMAT=<ID=RTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, reverse strand">
##FORMAT=<ID=PM,Number=1,Type=Float,Description="Proportion of mut allele">
##SAMPLE=<ID=NORMAL,Description="Normal",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001c,Source=.>
##SAMPLE=<ID=TUMOUR,Description="Tumour",Accession=.,Platform=ILLUMINA,Protocol=WGS,SampleName=AD0001b_lo0019,Source=.>
##FILTER=<ID=DTH,Description="Less than 1/3 mutant alleles were >= 25 base quality">
##INFO=<ID=CLPM,Number=1,Type=Float,Description="A soft flag median number of soft clipped bases in variant supporting reads">
##INFO=<ID=ASMD,Number=1,Type=Float,Description="A soft flag median alignement score of reads showing the variant allele">
##vcfProcessLog_20180918.1=<InputVCF=<AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz>,InputVCFSource=<FlagCaVEManVCF.pl>,InputVCFVer=<1.7.3>,InputVCFParam=<sp=.,umv=.,h=.,g=.,f=AD0001b_lo0019_vs_AD0001c.muts.ids.vcf.gz,t=WGS,loud=.,n=AD0001c.bam,ref=genome.fa.fai,m=AD0001b_lo0019.bam,v=flag.to.vcf.convert.ini,s=TIGER,l=2000,ab=.,p=.,c=Tiger_flag.vcf.config.ini,b=.,idx=.,o=D0001b_lo0019_vs_AD0001c_flagged.vcf>>
    #CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR
    contig01000014.1    1128    165247d6-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=89;MP=8.8e-01;GP=5.0e-11;TG=TT/CTTTT;TP=8.8e-01;SG=TT/TTTTT;SP=1.2e-01;CLPM=0.00;ASMD=136.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:11:0:0:1:27:0.0e+00   0|1:0:0:0:19:0:5:0:26:1.0e-01
    contig01000543.1    276 16524d8a-b4df-11e8-bec4-89a28c540d42    T   C   .   PASS    DP=67;MP=8.7e-01;GP=1.3e-01;TG=TT/CTTTT;TP=8.7e-01;SG=CT/CTTTT;SP=1.2e-01;CLPM=0.00;ASMD=57.50  GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:0:3:0:1:0:12:6.2e-02    0|1:0:3:1:21:0:3:0:23:1.2e-01
    contig01000692.1    2433    16524ee8-b4df-11e8-bec4-89a28c540d42    A   T   .   PASS    DP=74;MP=9.8e-01;GP=2.0e-02;TG=AA/AAAAT;TP=8.8e-01;SG=AA/AAATT;SP=9.6e-02;CLPM=9.00;ASMD=101.00 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:19:0:0:3:25:0:0:0:6.4e-02   0|1:11:0:0:5:11:0:0:0:1.9e-01
    contig01000694.1    830 16525032-b4df-11e8-bec4-89a28c540d42    C   T   .   PASS    DP=114;MP=1.0e+00;GP=5.6e-14;TG=CC/CCCCT;TP=8.6e-01;SG=CC/CCCTT;SP=1.4e-01;CLPM=0.00;ASMD=111.00    GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:26:0:0:0:42:0:2:2.9e-02   0|1:0:8:0:1:0:26:0:9:2.3e-01
    contig01000694.1    872 16525172-b4df-11e8-bec4-89a28c540d42    G   T   .   PASS    DP=89;MP=1.0e+00;GP=2.9e-07;TG=GG/GGGTT;TP=7.4e-01;SG=GG/GGGGT;SP=2.4e-01;CLPM=0.00;ASMD=115.50 GT:FAZ:FCZ:FGZ:FTZ:RAZ:RCZ:RGZ:RTZ:PM   0|0:0:0:23:0:0:0:34:3:5.0e-02   0|1:0:0:9:0:0:0:11:9:3.1e-01

I would like to convert this into a GRanges format such that I can subset the 'TUMOUR' data (excluding the 'NORMAL') and plot the variants as a rainfall plot with the KaryotypeR package using these commands from the tutorial (where sm.gr is a GRanges object of the data):

library(karyoploteR)
kp <- plotKaryotype(plot.type=4)
kpPlotRainfall(kp, data = sm.gr)

Could someone please point me towards a good way to read in and convert a VCF to a GRanges format suitable for this task?

Thanks for your help.

R • 113 views
ADD COMMENTlink modified 6 days ago by Bastien Hervé2.2k • written 6 days ago by Rubal200

however when I try to convert this to a GRanges object (which I believe is needed for karyotypeR, it does not work.

What is the exact error you're seeing? Can you give an example row from your vcf? That example data you have there doesn't look like a vcf

ADD REPLYlink written 6 days ago by Philipp Bayer5.7k

Thanks for the reply. The data I give is an alternative format encase that is easier to work with for karyotypeR. I've amended the answer with an example vcf row and example error

ADD REPLYlink written 6 days ago by Rubal200

Tips : When you get an error on a command line, don't go further in your analysis, you will fail

Tips2 : Always put command lines and error messages in your post

You got :

Error in .normargSEW0(start, "start") : 'start' must be a numeric vector (or NULL)

It speaks for itself, your start column is not numeric

Try to :

data$start = as.numeric(as.character(data$start))

And same for the end column I guess.

Could you had the command to generate the variable data please

ADD REPLYlink modified 6 days ago • written 6 days ago by Bastien Hervé2.2k

Thanks for the help but 'start' is a column of integer values in my table. I think I am inappropriately using the toGranges() function. I feel there must be a straightforward way to convert a vcf to a granges object suitable for karyotypeR and was just wondering if someone could point me to that.

ADD REPLYlink written 6 days ago by Rubal200

Please include the result of :

head -30 your_file.vcf

in your post

ADD REPLYlink modified 6 days ago • written 6 days ago by Bastien Hervé2.2k

substantially modified the post for readability and will do this thanks

ADD REPLYlink written 6 days ago by Rubal200

Please include the ##header too. A lot of tools to read vcf file use these header marks to check the vcf file. Upload your file somewhere (dropbox....) and add the link in your post is the best way to do it

ADD REPLYlink modified 6 days ago • written 6 days ago by Bastien Hervé2.2k

OK I just added some of it as there are >1000 lines for header due to a large number of contigs

ADD REPLYlink written 6 days ago by Rubal200

You deleted the interesting stuff with the error message and your command lines. Please add them

ADD REPLYlink written 6 days ago by Bastien Hervé2.2k
2
gravatar for Bastien Hervé
6 days ago by
Bastien Hervé2.2k
Limoges, CBRS, France
Bastien Hervé2.2k wrote:
library(karyoploteR)
tmp.vcf<-readLines("your_file.vcf")
tmp.vcf.data<-read.table("your_file.vcf")
tmp.vcf<-tmp.vcf[-(grep("#CHROM",tmp.vcf)+1):-(length(tmp.vcf))]
vcf.names<-unlist(strsplit(tmp.vcf[length(tmp.vcf)],"\t"))
names(tmp.vcf.data)<-vcf.names
colnames(tmp.vcf.data)[1] <- "Chr"
colnames(tmp.vcf.data)[2] <- "Start"
tmp.vcf.data = cbind(tmp.vcf.data,End=rep(tmp.vcf.data$Start+1))
tmp.vcf.data <- tmp.vcf.data[, c(1,2,12, 3:11)]
toGRanges(tmp.vcf.data)

Edit : Good luck to generate a custum genome based on your contigs in KaryoploteR !

ADD COMMENTlink modified 6 days ago • written 6 days ago by Bastien Hervé2.2k

Thanks the code. When I try with this commandL kpPlotRainfall(kp, data = tmp.vcf.data , col=variant.colors) I get the error: Error in attr(x, "tsp") <- c(1, NROW(x), 1) : attempt to set an attribute on NULL I suspect this is related to needing to create a correct contig base plot for adding this data to, so will explore that.

ADD REPLYlink written 6 days ago by Rubal200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1787 users visited in the last hour