Question: Read genotype information with VariantAnnotation
1
gravatar for jeni
5 months ago by
jeni30
jeni30 wrote:

Hi,

is there any way to read genotype information (FORMAT field) GT:AD:AF using VariantAnnotation? Moreover, as I have used a somatic caller, I have 2 format fields for each variant (one from normal and one from tumour sample); so, how can I preserve sample name as a column name when I transform this vcf to a dataframe?

As an example:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Normal_sample   Tumour_sample
1   52052   .   C   T   .   PASS    CONTQ=93;DP=56 GT:AD:AF 0/0:36,0:0.026  0/1:15,4:0.238

I am reading the vcf with this command:

as.data.frame(cbind(vcf@rowRanges@seqnames,vcf@rowRanges@ranges@start, vcf@fixed, info(vcf)))

So I get the following dataframe:

vcf.rowRanges.ranges.start  REF ALT QUAL    FILTER  CONTQ   DP  
52052    C    T  pass 93   56

I would like to get something like this:

vcf.rowRanges.ranges.start REF ALT QUAL FILTER CONTQ DP NGT NAD NAF TGT TAD TAF
52052    C    T  pass 93   56  0/0  36,0 0.026  0/1 15,4:0 0.238
ADD COMMENTlink modified 5 months ago by benformatics2.0k • written 5 months ago by jeni30
1

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 5 months ago by RamRS30k
0
gravatar for benformatics
5 months ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

This is pretty well illustrated in the manual... Most standard entries can be accessed using the info() or geno() function call. I don't think you are going to be able to simply cbind all this information together without checking the format of each of the different columns you want to append. For instance AD provides two values - so you would likely have to make separate data.frames before merging.

> your.vcf <- readVcf('your.vcf')
> geno(your.vcf)
List of length 5
names(5): GT AD DP GQ PL
> head(geno(your.vcf)$GT)[[1]]
[1] "0/1"
> geno(your.vcf)$AD[[1]]
[1] 10  2
> info(your.vcf)$AF[[1]]
[1] 0.5
ADD COMMENTlink modified 5 months ago • written 5 months ago by benformatics2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1388 users visited in the last hour