How to find genomic locations of SNP probes in EPIC chips
2
0
Entering edit mode
3 months ago
Bosberg ▴ 50

I'm using this reference to try to see how to analyse data from Illumina EPIC chips .

I can obtain SNP beta values as follows:

> getSnpBeta( RGsetEpic )
           <sentrix>_R0xC0y
rs2468330           0.41281518
rs877309            0.02935337
rs2857639           0.47252599...

However this doesn't tell me anything about these SNPs (what base is substituted with what? where? etc.). With getSnpInfo, I get the following:

> getSnpInfo( RGsetEpic )
DataFrame with 865859 rows and 6 columns
              Probe_rs Probe_maf      CpG_rs   CpG_maf      SBE_rs   SBE_maf
           <character> <numeric> <character> <numeric> <character> <numeric>
cg18478105          NA        NA          NA        NA          NA        NA
cg09835024          NA        NA          NA        NA          NA        NA

which is... something, I guess, but not clear to me where the 59 snps are in there.

getLocations seems to give me the locations, but only for the CG probes (none of the rs probes are included in the resulting GRanges object)

> getLocations( RGsetEpic )
GRanges object with 865859 ranges and 0 metadata columns:
             seqnames    ranges strand
                <Rle> <IRanges>  <Rle>
  cg18478105    chr20  61847650      *
  cg09835024     chrX  24072640      *
  cg14361672     chr9 131463936      *
#...[only cg* probes here, no rs :( ]

Does anyone know how I can obtain the locations of the rs probes? (with that I can then query the reference genome and at least figure out the reference base -ideally I'd love to know the alt_base also.)

Thanks!

microarray snp epic • 553 views
ADD COMMENT
1
Entering edit mode
3 months ago
LChart 3.9k

the "rs" probe names are dbSNP accessions. You can get information about those in many ways, including rsnps::ncbi_snp_query("rs877309").

ADD COMMENT
0
Entering edit mode

Ah, I had thought the rs* labels were just probe names specific to the illumina array, and didn't even know that dbSNP existed. Your answer has helped me in many ways. Thank you! However do I understand correctly that this still has to be done manually for each probe name?

ADD REPLY
0
Entering edit mode

The easy solution is to use GenoMax's suggestion and download the annotations for the array.

ADD REPLY
1
Entering edit mode
3 months ago
GenoMax 142k

You have an easy answer from LChart . But in case that is not sufficient Illumina makes the human EPIC array annotation available here.

Example of one SNP

IlmnID,Name,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,AlleleB_ProbeSeq,Next_Base,Color_Channel,col,Probe_Type,Strand_FR,Strand_TB,Strand_CO,Infinium_Design,Infinium_Design_Type,CHR,MAPINFO,Species,Genome_Build,Source_Seq,Forward_Sequence,Top_Sequence,Rep_Num,UCSC_RefGene_Group,UCSC_RefGene_Name,UCSC_RefGene_Accession,UCSC_CpG_Islands_Name,Relation_to_UCSC_CpG_Island,GencodeV41_Group,GencodeV41_Name,GencodeV41_Accession,Phantom5_Enhancers,HMM_Island,Regulatory_Feature_Name,Regulatory_Feature_Group,450k_Enhancer,DMR,DNase_Hypersensitivity_NAME,Encode_CisReg_Site,Encode_CisReg_Site_Evid,OpenChromatin_NAME,OpenChromatin_Evidence_Count,Methyl450_Loci,Methyl27_Loci,EPICv1_Loci,Manifest_probe_match,SNP_ID,SNP_DISTANCE,SNP_MinorAlleleFrequency


rs2468330_TC21,rs2468330,40681836,ATCACACTTTTCATCACTCCATTTTTTTCCACCCAAAAATAATACTACTC,,,,,,rs,R,T,C,2,II,chr12,42804920,Human,GRCh38,CGAGTAGCACCATTCCTGGGTGGAAAAAAATGGAGTGATGAAAAGTGTGA,CACATGTGCAACTTTGTGGTAGGATACTGGGGGTTGTGTGGTTGACAGGTCCACCAGATG[G/M]AGTAGCACCATTCCTGGGTGGAAAAAAATGGAGTGATGAAAAGTGTGATTTCACATAATC,GATTATGTGAAATCACACTTTTCATCACTCCATTTTTTTCCACCCAGGAATGGTGCTACT[K/C]CATCTGGTGGACCTGTCAACCACACAACCCCCAGTATCCTACCACAAAGTTGCACATGTG,1,,,,,,,,,,,,,FALSE,,,,,Quies;TxWk;EnhWk;ReprPCWk;Het;EnhA2,1476;166;9;7;4;1,rs2468330,,rs2468330,TRUE,rs1048409348;rs1175474923;rs1296321954;rs1297922248;rs1309456635;rs1324247238;rs1390964259;rs1424027415;rs2468330;rs534242663;rs552677109;rs755984911;rs770177418;rs908289935,35;48;40;1;36;2;3;45;0;11;44;22;32;33,0;0;0;0;0;0;0;0;0.501;0;0;0;0;0
ADD COMMENT
0
Entering edit mode

Thanks for that resource, very helpful indeed! One thing I'm still curious about: if I look at csv file for the epic v1 chips EPIC-8v2-0_A1.csv I see labels like this:

rs10033147
rs1019916
rs1040870
rs10457834
rs10796216

whereas if I look at the epic v2 csv file EPIC-8v2-0_A1.csv, I see labels like this:

rs10033147_BC11
rs1019916_BC21
rs1040870_BC11
rs10457834_BC21
rs10774834_BC11

Up until the underscore they _mostly_ agree, so I assume they're the same. Does the "BC" tag at the end have any important significance that you know of?

ADD REPLY

Login before adding your answer.

Traffic: 1566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6