Question: How to extract mutation signatures from a merged vcf file with multiple groups of samples
0
gravatar for sagardesai91
10 weeks ago by
sagardesai9150
IBAB, Bengaluru, India
sagardesai9150 wrote:

Hello everyone, I have a merged vcf file where the mutation details of 3 groups of samples are present; eg: If there are three groups of samples A,B and C, the vcf file looks like this

CHR POS ID REF ALT A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5

Some samples have certain variants and some dont. Now, given such a file, Is there a package in R or some other language that can give mutation signatures specific to different groups?

ADD COMMENTlink modified 10 weeks ago by 2nelly170 • written 10 weeks ago by sagardesai9150
0
gravatar for 2nelly
10 weeks ago by
2nelly170
Geneva,Switzerland
2nelly170 wrote:

Hi sagardesai91

Just create a new vcf with the header below

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   MU43280717  A   T   .   .   MELA-AU
chr1    10026   MU75019506  A   G   .   .   PBCA-US
chr1    10074   MU121369972 A   G   .   .   PBCA-US
chr1    10080   MU121498435 A   G   .   .   PBCA-US
chr1    10085   MU121369537 T   G   .   .   PBCA-US
chr1    10086   MU121375628 A   G   .   .   PBCA-US
chr1    10087   MU121380000 A   G   .   .   PBCA-US
chr1    10091   MU121508239 T   G   .   .   PBCA-US
chr1    10098   MU121433300 A   G   .   .   PBCA-US
chr1    10108   MU15348322  C   T   .   .   LUSC-KR

add "." (dot) in evey field for ID(in the example I have specific ID) QUAL FILTER and in INFO field put the name of the sample (A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5) that the mutation belongs to.

Then you can use SomaticSignatures in R to plot all signatures per sample in once:

ibrary(SomaticSignatures)
library(BSgenome.Hsapiens.UCSC.hg19)
library(ggplot2)
vcf <- readVcf("sample.vcf", "hg19")
vr<- as(vcf, "VRanges")
sca_motifs = mutationContext(vr, BSgenome.Hsapiens.UCSC.hg19, unify = TRUE)

To plot the signatures and use the common colors you find in cosmic and publications use:

plotMutationSpectrum(sca_motifs, "INFO", colorby = c("alteration"), normalize = TRUE) + labs (x="\nsignature", y="contribution\n") + theme(axis.text.x = element_text(size = 10)) + theme(axis.text.y = element_text(size = 10)) + theme(axis.title.x =element_text(size=12,face="bold"), axis.title.y =element_text(size=12,face="bold")) + scale_fill_manual(values=c("lightblue", "black", "red", "grey", "darkolivegreen3", "lightsalmon"))

If you want to get the values in matrix and then plot it or use it somewhere else do:

sca_mm = motifMatrix(sca_motifs, group="INFO", normalize = TRUE)

In case you want to change the order of your samples you can do this after mutationContext function:

sca_motifs$INFO<-factor(sca_motifs$INFO, levels = c("A1", "B4", "B1", "C5",.......,etc))

Finally if you want to plot raw numbers (not normalized in %), change normalized value to FALSE

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by 2nelly170

This is amazing, I'll try it out and let you know if it works, thank you so much!

ADD REPLYlink written 10 weeks ago by sagardesai9150

Hey the confusion remains, what do I do if one particular mutation is present in more than one samples?

ADD REPLYlink written 10 weeks ago by sagardesai9150

Report them in the vcf file but with different INFO section (A1,A4,B3)

when you are gonna plot them, SomaticSignatures will take into account these variants separately, because plotMutationSpectrum function will plot the according to the INFO column

e.g.

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   .  A   T   .   .   A1
chr1    10026   .  A   G   .   .   A4
chr1    10026   .  A   G   .   .   B3

So no worries!!!!

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by 2nelly170

Okay, so If I've understood this correctly, my input vcf file, with mutations present in multiple samples will look like this: (Just as an examples)

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   .  A   T   .   .   A1
chr1    10026   .  A   G   .   .   A1,A2,A3,B2
chr1    10074   .  A   G   .   .   B1,B3,B4
chr1    10080   .  A   G   .   .   A4,C1
chr1    10085   .  T   G   .   .   C2,C3
chr1    10086   .  A   G   .   .   C1

am I right?

ADD REPLYlink modified 10 weeks ago by genomax71k • written 10 weeks ago by sagardesai9150

oh no no, I got it! Sorry for the trouble, thanks a ton!

ADD REPLYlink written 10 weeks ago by sagardesai9150

I don t know because your post is a disaster...hahahaha. Can you please edit it using the code button.

If I understand correctly is wrong

The correct should be

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1 10002 . A T . . A1
chr1 10026 . A G . . A1
chr1 10026 . A G . . A2
chr1 10026 . A G . . A3
chr1 10026 . A G . . B2
chr1 10074 . A G . . B1
chr1 10074 . A G . . B3
chr1 10074 . A G . . B4
.
.
.
.
.
ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by 2nelly170

Hahaha, I am extremely sorry about that post, but yes I got it :D

ADD REPLYlink written 10 weeks ago by sagardesai9150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1979 users visited in the last hour