Question

Gene duplicate

0

Entering edit mode

13 months ago

alghezim1 • 0

Hi there,

I am pretty new to single cell RNA seq and I am trying to learn by doing analysis for a data that has been published already. I am using monocle3 and I realized that some Ensembl IDs that are the same and I was wondering in case of filtering them out based on duplicate which one would be filtered like I am just having a hard time figuring out if it is the most abundant one that is being filtered or the least one.

GeneNameSymbol2=GeneNameSymbol[!duplicated(GeneNameSymbol$ENSEMBL),]

Someone used this code on GitHub but I am still trying to understand how this work. especially when doing the top 10 analysis. Thank you so much!

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices
[5] utils     datasets  methods   base

other attached packages:
 [1] Hmisc_4.7-2
 [2] Formula_1.2-4
 [3] survival_3.5-0
 [4] lattice_0.20-45
 [5] org.Mm.eg.db_3.16.0
 [6] AnnotationDbi_1.60.0
 [7] Matrix_1.5-3
 [8] reticulate_1.28
 [9] dplyr_1.1.0
[10] magrittr_2.0.3
[11] ggplot2_3.4.1
[12] monocle3_1.3.1
[13] SingleCellExperiment_1.20.0
[14] SummarizedExperiment_1.28.0
[15] GenomicRanges_1.50.2
[16] GenomeInfoDb_1.34.6
[17] IRanges_2.32.0
[18] S4Vectors_0.36.1
[19] MatrixGenerics_1.10.0
[20] matrixStats_0.63.0
[21] Biobase_2.58.0
[22] BiocGenerics_0.44.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-161           bitops_1.0-7
 [3] bit64_4.0.5            RColorBrewer_1.1-3
 [5] httr_1.4.4             backports_1.4.1
 [7] tools_4.2.2            utf8_1.2.3
 [9] R6_2.5.1               rpart_4.1.19
[11] DBI_1.1.3              colorspace_2.1-0
[13] nnet_7.3-18            withr_2.5.0
[15] tidyselect_1.2.0       gridExtra_2.3
[17] bit_4.0.5              compiler_4.2.2
[19] cli_3.6.0              htmlTable_2.4.1
[21] DelayedArray_0.24.0    scales_1.2.1
[23] checkmate_2.1.0        stringr_1.5.0
[25] digest_0.6.31          foreign_0.8-84
[27] minqa_1.2.5            XVector_0.38.0
[29] htmltools_0.5.4        base64enc_0.1-3
[31] jpeg_0.1-10            pkgconfig_2.0.3
[33] parallelly_1.34.0      lme4_1.1-31
[35] fastmap_1.1.0          htmlwidgets_1.6.1
[37] rlang_1.0.6            rstudioapi_0.14
[39] RSQLite_2.2.20         generics_0.1.3
[41] jsonlite_1.8.4         RCurl_1.98-1.9
[43] GenomeInfoDbData_1.2.9 interp_1.1-3
[45] Rcpp_1.0.10            munsell_0.5.0
[47] fansi_1.0.4            lifecycle_1.0.3
[49] terra_1.6-47           stringi_1.7.12
[51] MASS_7.3-58.1          zlibbioc_1.44.0
[53] plyr_1.8.8             grid_4.2.2
[55] blob_1.2.3             parallel_4.2.2
[57] listenv_0.9.0          crayon_1.5.2
[59] deldir_1.0-6           Biostrings_2.66.0
[61] splines_4.2.2          KEGGREST_1.38.0
[63] knitr_1.42             pillar_1.8.1
[65] igraph_1.4.0           boot_1.3-28.1
[67] codetools_0.2-18       glue_1.6.2
[69] latticeExtra_0.6-30    data.table_1.14.6
[71] png_0.1-8              vctrs_0.5.2
[73] nloptr_2.0.3           gtable_0.3.1
[75] assertthat_0.2.1       future_1.31.0
[77] cachem_1.0.6           xfun_0.37
[79] tibble_3.1.8           memoise_2.0.1
[81] cluster_2.1.4          globals_0.16.2

monocle3 • 561 views

ADD COMMENT • link updated 13 months ago by ATpoint 82k • written 13 months ago by alghezim1 • 0

0

Entering edit mode

some Ensembl IDs that are the same

Please show us some examples.

ADD REPLY • link 13 months ago by Ram 43k

score 0 · Answer 1 · 2023-03-31

0

Entering edit mode

13 months ago

ATpoint 82k

Use use ensembID_geneName as gene identifier to avoid that. These duplicate names exist as genes so removing them is somewhat not really data-driven. Ensembl ID is always unique, just names are not and often a mess.

ADD COMMENT • link 13 months ago by ATpoint 82k