Hi,
I am trying to run a deconvolution analysis of bulk-RNAseq samples using the LM22 signature matrix provided. I converted all ENSEMBL ID's to their Symbol, and removed NA and duplicated entries.
counts_salmon <- as.data.frame(txi$counts)
counts_salmon$symbol <- mapIds(org.Hs.eg.db,
keys = rownames(counts_salmon),
column = "SYMBOL",
keytype = "ENSEMBL")
counts_salmon <- counts_salmon |>
distinct(symbol, .keep_all = T) |>
rownames_to_column(var = "ensbl") |>
select(-ensbl) |>
filter(!is.na(symbol)) |>
column_to_rownames(var = "symbol")
counts_salmon <- na.omit(counts_salmon )
write.table(counts_salmon , file = 'output/counts_salmon.tsv', append = FALSE, sep = "\t",
row.names = TRUE, col.names = TRUE, quote = FALSE)
The output is a .tsv without double quotation:
Genes rna_11 RNA_26 RNA_8 RNA_16 RNA_19 rna_47 RNA_3 RNA_24
TSPAN6 0 0 8 0 249.567 76.756 26.741 308.308
TNMD 0 0 0 0 0 0 38 0
DPM1 58.092 31.013 0 67 303.226 570.16 48.289 1078.792
SCYL3 39.036 42.86 0 0 27.801 146.749 7 414.861
C1orf112 14 1 0 0 38.234 91.923 87.89 165.261
FGR 0 47 0 1 25 69 0 158
...
I'm using this file as a input for my mixture file in CIBERSORTx, with the following parameteres:
[Options] perm: 1
[Options] verbose: TRUE
[Options] rmbatchBmode: TRUE
[Options] QN: FALSE
[Options] outdir: files/mam9823@med.cornell.edu/results/
[Options] label: Job11
=============CIBERSORTx Settings===============
Mixture file: files/mam9823@med.cornell.edu/counts_salmon.tsv
Signature matrix file: files/common/LM22.update-gene-symbols.txt
Number of permutations set to: 1
Enable verbose output
Do B-mode batch correction
==================CIBERSORTx===================
All done.
However, I keep getting this error:
Error: $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In CIBERSORTxFractions(sigmatrix = sigmatrix, mixture = mixture, :
22292 duplicated gene symbol(s) found in mixture file!
2: In mclapply(1:svn_itor, res, mc.cores = svn_itor) :
all scheduled cores encountered errors in user code
Execution halted
Thanks a lot in advance for any help!
I got exactly the same error message. So I am curious to see how other people solved this... Did you already contact the authors about this?
Hello! I met the same error too, but managed to find out how it happens. The "duplicated gene symbol(s)" in the error message is actually referring the first column (NOT row names) of your mixture file, which means it recognized your first column of expression data as row names (gene symbol) by mistake. This is the probably cause: when you're running "write.table" with R, the argument "row.names = TRUE" will generate a line (the REAL first column) WITHOUT column name. Because the REAL first column doesn't have a column name (the column name is blank or empty so the REAL first column is omitted), the error occurs. Here's my solution (It WORKS): mixture_file <- cbind(rownames(mixture_file),mixture_file) write.table(mixture_file, file = "mixture_file.txt", sep = "\t", row.names = FALSE, col.names = TRUE,quote=FALSE)