What is the difference between agilent_wholegenome and agilent_wholegenome_4x44k_v1 in biomart?
Entering edit mode
6 months ago
solarchan7 • 0

I am trying to convert some agilent ids into ensembl gene id with biomaRt() in R, but i realize there are three groups for agilent ids: agilent_wholegenome, agilent_wholegenome_4x44k_v1, and agilent_wholegenome_4x44k_v2.

The agilent website only explained what 4x44 is but not the differences between them: https://www.agilent.com/en/product/cgh-cgh-snp-microarray-platform/cgh-cgh-snp-microarrays/human-microarrays/human-genome-cgh-microarray-kit-4x44k-228410

Which one is more appropriate?

Thank you

genetics biomaRt gene • 416 views
Entering edit mode
6 months ago
Mike Smith ★ 2.0k

We can take a look at the content of those three attributes to try and figure this out. Here's some code to extract each set of Agilent IDs:

human.mart <- useEnsembl(biomart = "genes", dataset="hsapiens_gene_ensembl")

## get the three sets of Agilent IDs
wg <- getBM(attributes = "agilent_wholegenome", mart = human.mart)
wg44v1 <- getBM(attributes = "agilent_wholegenome_4x44k_v1", mart = human.mart)
wg44v2 <- getBM(attributes = "agilent_wholegenome_4x44k_v2", mart = human.mart)

Now we can compare the IDs to see if there's any overlap:

table(wg$agilent_wholegenome %in% wg44v1$agilent_wholegenome_4x44k_v1)
#>  TRUE 
#> 32263

table(wg$agilent_wholegenome %in% wg44v1$agilent_wholegenome_4x44k_v2)
#> 32263

The above output indicates that the agilent_wholegenome and agilent_wholegenome_4x44k_v1 attributes are identical, so you can probably use either one if you have V1 arrays. I'm not sure why this is duplicated in BioMart. On the other hand, the V2 IDs have no overlap with V1.

Ideally the array data you're using wold be accompanied by metdata to help you know if it was V1 or V2. If you can't find that, since the output above indicates that there's no overlapping IDs between the two versions, perhaps just looking at the IDs you do have will be sufficient to identify the version you're working with i.e. here's the first ten IDs from each platform. If you can see any of these in your data, you know what you're working with.

#> [1] "A_24_P179339" "A_24_P42453"  "A_24_P179336" "A_23_P331028" "A_24_P182122"
#> [6] "A_23_P431853"
#> [1] "A_24_P182122"  "A_23_P431853"  "A_23_P402751"  "A_33_P3410700"
#> [5] "A_23_P301925"  "A_23_P337726"

Login before adding your answer.

Traffic: 2268 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6