Gene duplicate
1
0
Entering edit mode
13 months ago
alghezim1 • 0

Hi there,

I am pretty new to single cell RNA seq and I am trying to learn by doing analysis for a data that has been published already. I am using monocle3 and I realized that some Ensembl IDs that are the same and I was wondering in case of filtering them out based on duplicate which one would be filtered like I am just having a hard time figuring out if it is the most abundant one that is being filtered or the least one.

GeneNameSymbol2=GeneNameSymbol[!duplicated(GeneNameSymbol$ENSEMBL),]

Someone used this code on GitHub but I am still trying to understand how this work. especially when doing the top 10 analysis. Thank you so much!

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices
[5] utils     datasets  methods   base

other attached packages:
 [1] Hmisc_4.7-2
 [2] Formula_1.2-4
 [3] survival_3.5-0
 [4] lattice_0.20-45
 [5] org.Mm.eg.db_3.16.0
 [6] AnnotationDbi_1.60.0
 [7] Matrix_1.5-3
 [8] reticulate_1.28
 [9] dplyr_1.1.0
[10] magrittr_2.0.3
[11] ggplot2_3.4.1
[12] monocle3_1.3.1
[13] SingleCellExperiment_1.20.0
[14] SummarizedExperiment_1.28.0
[15] GenomicRanges_1.50.2
[16] GenomeInfoDb_1.34.6
[17] IRanges_2.32.0
[18] S4Vectors_0.36.1
[19] MatrixGenerics_1.10.0
[20] matrixStats_0.63.0
[21] Biobase_2.58.0
[22] BiocGenerics_0.44.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-161           bitops_1.0-7
 [3] bit64_4.0.5            RColorBrewer_1.1-3
 [5] httr_1.4.4             backports_1.4.1
 [7] tools_4.2.2            utf8_1.2.3
 [9] R6_2.5.1               rpart_4.1.19
[11] DBI_1.1.3              colorspace_2.1-0
[13] nnet_7.3-18            withr_2.5.0
[15] tidyselect_1.2.0       gridExtra_2.3
[17] bit_4.0.5              compiler_4.2.2
[19] cli_3.6.0              htmlTable_2.4.1
[21] DelayedArray_0.24.0    scales_1.2.1
[23] checkmate_2.1.0        stringr_1.5.0
[25] digest_0.6.31          foreign_0.8-84
[27] minqa_1.2.5            XVector_0.38.0
[29] htmltools_0.5.4        base64enc_0.1-3
[31] jpeg_0.1-10            pkgconfig_2.0.3
[33] parallelly_1.34.0      lme4_1.1-31
[35] fastmap_1.1.0          htmlwidgets_1.6.1
[37] rlang_1.0.6            rstudioapi_0.14
[39] RSQLite_2.2.20         generics_0.1.3
[41] jsonlite_1.8.4         RCurl_1.98-1.9
[43] GenomeInfoDbData_1.2.9 interp_1.1-3
[45] Rcpp_1.0.10            munsell_0.5.0
[47] fansi_1.0.4            lifecycle_1.0.3
[49] terra_1.6-47           stringi_1.7.12
[51] MASS_7.3-58.1          zlibbioc_1.44.0
[53] plyr_1.8.8             grid_4.2.2
[55] blob_1.2.3             parallel_4.2.2
[57] listenv_0.9.0          crayon_1.5.2
[59] deldir_1.0-6           Biostrings_2.66.0
[61] splines_4.2.2          KEGGREST_1.38.0
[63] knitr_1.42             pillar_1.8.1
[65] igraph_1.4.0           boot_1.3-28.1
[67] codetools_0.2-18       glue_1.6.2
[69] latticeExtra_0.6-30    data.table_1.14.6
[71] png_0.1-8              vctrs_0.5.2
[73] nloptr_2.0.3           gtable_0.3.1
[75] assertthat_0.2.1       future_1.31.0
[77] cachem_1.0.6           xfun_0.37
[79] tibble_3.1.8           memoise_2.0.1
[81] cluster_2.1.4          globals_0.16.2
monocle3 • 560 views
ADD COMMENT
0
Entering edit mode

some Ensembl IDs that are the same

Please show us some examples.

ADD REPLY
0
Entering edit mode
13 months ago
ATpoint 82k

Use use ensembID_geneName as gene identifier to avoid that. These duplicate names exist as genes so removing them is somewhat not really data-driven. Ensembl ID is always unique, just names are not and often a mess.

ADD COMMENT

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6