Issues with Mixture file when using CIBERSORTx
4 months ago


I am trying to run a deconvolution analysis of bulk-RNAseq samples using the LM22 signature matrix provided. I converted all ENSEMBL ID's to their Symbol, and removed NA and duplicated entries.

counts_salmon <-$counts)

counts_salmon$symbol <- mapIds(,
                            keys = rownames(counts_salmon),
                            column = "SYMBOL",
                            keytype = "ENSEMBL")
counts_salmon <- counts_salmon  |>
  distinct(symbol, .keep_all = T) |>
  rownames_to_column(var = "ensbl") |>
  select(-ensbl) |>
  filter(! |>
  column_to_rownames(var = "symbol")

counts_salmon <- na.omit(counts_salmon )

write.table(counts_salmon , file = 'output/counts_salmon.tsv', append = FALSE, sep = "\t", 
            row.names = TRUE, col.names = TRUE, quote = FALSE)

The output is a .tsv without double quotation:

Genes   rna_11  RNA_26  RNA_8   RNA_16  RNA_19  rna_47  RNA_3   RNA_24
TSPAN6  0   0   8   0   249.567 76.756  26.741  308.308
TNMD    0   0   0   0   0   0   38  0
DPM1    58.092  31.013  0   67  303.226 570.16  48.289  1078.792
SCYL3   39.036  42.86   0   0   27.801  146.749 7   414.861
C1orf112    14  1   0   0   38.234  91.923  87.89   165.261
FGR 0   47  0   1   25  69  0   158

I'm using this file as a input for my mixture file in CIBERSORTx, with the following parameteres:

[Options] perm: 1
[Options] verbose: TRUE
[Options] rmbatchBmode: TRUE
[Options] QN: FALSE
[Options] outdir: files/
[Options] label: Job11
=============CIBERSORTx Settings===============
Mixture file: files/ 
Signature matrix file: files/common/LM22.update-gene-symbols.txt 
Number of permutations set to: 1 
Enable verbose output
Do B-mode batch correction
All done.

However, I keep getting this error:

Error: $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In CIBERSORTxFractions(sigmatrix = sigmatrix, mixture = mixture,  :
  22292 duplicated gene symbol(s) found in mixture file!
2: In mclapply(1:svn_itor, res, mc.cores = svn_itor) :
  all scheduled cores encountered errors in user code
Execution halted

Thanks a lot in advance for any help!

I got exactly the same error message. So I am curious to see how other people solved this... Did you already contact the authors about this?

Hello! I met the same error too, but managed to find out how it happens. The "duplicated gene symbol(s)" in the error message is actually referring the first column (NOT row names) of your mixture file, which means it recognized your first column of expression data as row names (gene symbol) by mistake. This is the probably cause: when you're running "write.table" with R, the argument "row.names = TRUE" will generate a line (the REAL first column) WITHOUT column name. Because the REAL first column doesn't have a column name (the column name is blank or empty so the REAL first column is omitted), the error occurs. Here's my solution (It WORKS): mixture_file <- cbind(rownames(mixture_file),mixture_file) write.table(mixture_file, file = "mixture_file.txt", sep = "\t", row.names = FALSE, col.names = TRUE,quote=FALSE)


