How to extract mutation signatures from a merged vcf file with multiple groups of samples
1
0
Entering edit mode
3.2 years ago
sagardesai91 ▴ 50

Hello everyone, I have a merged vcf file where the mutation details of 3 groups of samples are present; eg: If there are three groups of samples A,B and C, the vcf file looks like this

CHR POS ID REF ALT A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5

Some samples have certain variants and some dont. Now, given such a file, Is there a package in R or some other language that can give mutation signatures specific to different groups?

vcf file mutation signature • 1.7k views
0
Entering edit mode
3.2 years ago
2nelly ▴ 310

Hi sagardesai91

Just create a new vcf with the header below

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   MU43280717  A   T   .   .   MELA-AU
chr1    10026   MU75019506  A   G   .   .   PBCA-US
chr1    10074   MU121369972 A   G   .   .   PBCA-US
chr1    10080   MU121498435 A   G   .   .   PBCA-US
chr1    10085   MU121369537 T   G   .   .   PBCA-US
chr1    10086   MU121375628 A   G   .   .   PBCA-US
chr1    10087   MU121380000 A   G   .   .   PBCA-US
chr1    10091   MU121508239 T   G   .   .   PBCA-US
chr1    10098   MU121433300 A   G   .   .   PBCA-US
chr1    10108   MU15348322  C   T   .   .   LUSC-KR

add "." (dot) in evey field for ID(in the example I have specific ID) QUAL FILTER and in INFO field put the name of the sample (A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5) that the mutation belongs to.

Then you can use SomaticSignatures in R to plot all signatures per sample in once:

library(SomaticSignatures)
library(BSgenome.Hsapiens.UCSC.hg19)
library(ggplot2)
vr<- as(vcf, "VRanges")
sca_motifs = mutationContext(vr, BSgenome.Hsapiens.UCSC.hg19, unify = TRUE)

To plot the signatures and use the common colors you find in cosmic and publications use:

plotMutationSpectrum(sca_motifs, "INFO", colorby = c("alteration"), normalize = TRUE) + labs (x="\nsignature", y="contribution\n") + theme(axis.text.x = element_text(size = 10)) + theme(axis.text.y = element_text(size = 10)) + theme(axis.title.x =element_text(size=12,face="bold"), axis.title.y =element_text(size=12,face="bold")) + scale_fill_manual(values=c("lightblue", "black", "red", "grey", "darkolivegreen3", "lightsalmon"))

If you want to get the values in matrix and then plot it or use it somewhere else do:

sca_mm = motifMatrix(sca_motifs, group="INFO", normalize = TRUE)

In case you want to change the order of your samples you can do this after mutationContext function:

sca_motifs$INFO<-factor(sca_motifs$INFO, levels = c("A1", "B4", "B1", "C5",.......,etc))

Finally if you want to plot raw numbers (not normalized in %), change normalized value to FALSE

0
Entering edit mode

This is amazing, I'll try it out and let you know if it works, thank you so much!

0
Entering edit mode

Hey the confusion remains, what do I do if one particular mutation is present in more than one samples?

0
Entering edit mode

Report them in the vcf file but with different INFO section (A1,A4,B3)

when you are gonna plot them, SomaticSignatures will take into account these variants separately, because plotMutationSpectrum function will plot the according to the INFO column

e.g.

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   .  A   T   .   .   A1
chr1    10026   .  A   G   .   .   A4
chr1    10026   .  A   G   .   .   B3

So no worries!!!!

0
Entering edit mode

Okay, so If I've understood this correctly, my input vcf file, with mutations present in multiple samples will look like this: (Just as an examples)

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    10002   .  A   T   .   .   A1
chr1    10026   .  A   G   .   .   A1,A2,A3,B2
chr1    10074   .  A   G   .   .   B1,B3,B4
chr1    10080   .  A   G   .   .   A4,C1
chr1    10085   .  T   G   .   .   C2,C3
chr1    10086   .  A   G   .   .   C1

am I right?

0
Entering edit mode

oh no no, I got it! Sorry for the trouble, thanks a ton!

0
Entering edit mode

I don t know because your post is a disaster...hahahaha. Can you please edit it using the code button.

If I understand correctly is wrong

The correct should be

##fileformat=VCFv4.2
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1 10002 . A T . . A1
chr1 10026 . A G . . A1
chr1 10026 . A G . . A2
chr1 10026 . A G . . A3
chr1 10026 . A G . . B2
chr1 10074 . A G . . B1
chr1 10074 . A G . . B3
chr1 10074 . A G . . B4
.
.
.
.
.
0
Entering edit mode

Hahaha, I am extremely sorry about that post, but yes I got it :D