how count different unique letter for each column?
10 months ago
star ▴ 350

I have a table like below. How to count the number of unique different letters in all columns except (column1) for all rows versus row 1.

Input:

              query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A


Output:

            query.  letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A
4 differences            0        0        0        0        2        1        0        1       1

Read up on XY problems. That's what you're doing here as well as in your previous question here: how calculate different amino acids in a aligning format?

It looks like you wish residue level counts of unique bases (AKA conservation scores), which is not an uncommon problem in the alignment context. Search online - R is not the best way to do this.

10 months ago
ATpoint 84k

You should provide data as dput(), not these pastes. That makes it easier to copy it. Here a simple solution:


data <- data.table::fread(text="              query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
lcl|Query_10001        M        E        K        I        V        L        L        F        A
lcl|Query_10002        M        E        K        I        G        K        L        L        S
lcl|Query_10003        M        E        K        I        M        L        L        L        A",
data.table=FALSE)

r <- apply(data[,2:ncol(data)], 2, function(x) length(unique(x))) - 1
data[4,] <- c("differences", as.numeric(r))

data
query letter_1 letter_2 letter_3 letter_4 letter_5 letter_6 letter_7 letter_8 letter_9
1 lcl|Query_10001        M        E        K        I        V        L        L        F        A
2 lcl|Query_10002        M        E        K        I        G        K        L        L        S
3 lcl|Query_10003        M        E        K        I        M        L        L        L        A
4     differences        0        0        0        0        2        1        0        1        1