Entering edit mode
2.8 years ago
Mamatha Y S
•
0
# duplicated genes and number of duplicates
duplicated_genes <- names(table(df$hgnc_symbol)[table(df$hgnc_symbol) > 1])
gene_counts <- table(df$hgnc_symbol)[duplicated_genes]
#zero expression of each gene
zero_counts <- sapply(unique(duplicated_genes), function(gene) {
sum(rowSums(df[df$hgnc_symbol == gene, -ncol(df)]) == 0)
})
This is the code I'm running. I want to identify duplicate gene from my data frame, and their frequency and in third column I want to know in each duplicated for example its duplicated 7 times, in this 7 times how many of them having rowsum zero (gene expression zero for all samples).
First two lines I'm getting correct result but zero expression I'm getting NA for all the genes I m not getting why. Please help me with this
Is the
hgnc_symbolthe last column in yourdf? Is that why you're using-ncol(df)for therowSumsfunction?You're getting
NAbecause some values in yourdfareNA. You could usena.rm = TRUEparameter in thesumfunction as long as you understand what it's doing and the fact that you're expecting 0 and there's alsoNAin there indicating there must either be a gap in your expectations or a difference in what 0 andNAmean.