I want to reproduce the operation about Motif identification at positions of SNVs, which is the part of "The chromatin accessibility landscape of primary human cancers"
The detailed method is below:
To identify motifs whose binding affinity is predicted to be most perturbed by a given SNV, we constructed a REF and VAR string (+/- 10 bp from the SNV, 21 bp in length) using the hg38 BSGenome reference sequence. We then used motifmatchr with CIS-BP motifs to calculate the motif positions and motif scores (representing the significance of the motif match) using motifmatchr “matchMotifs(positions = “out”, p.cutoff=0.01)” in both the REF and VAR string. Then, we intersected the motif positions with the SNV in both the REF and VAR string to ensure the motifs overlap the SNV. For each motif present, the motif’s score differential was computed if the motif was present at the same position in both the REF and VAR. If there was no motif present at the same position in both the REF and VAR, the differential was compared to a motif score of “0”.
For now, I just wan to know how to constructed a REF and VAR string (+/- 10 bp from the SNV, 21 bp in length). I try to use
BSgenome, but it need the
SNPlocs-class, which is hard to get for me. So I am wondering whether some one can give me some other advices.
PS: the bioconductor support link I posted about
SNPlocs-class is here
You may be able to use GATK
FastaAlternateReferenceMaker(LINK) for this.
Thanks! this look good :) I will try it.