Entering edit mode
23 months ago
StartR
▴
30
Hi I am running the following code and get this error:
Error in GDCprepare(query.mut.masked) :
There are samples duplicated. We will not be able to prepare it
query.mut.masked <- GDCquery(
project = "TCGA-BRCA",
data.category = "Simple Nucleotide Variation",
access = "open",
legacy = FALSE,
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking",
# file.type = "maf"
sample.type = c("Primary Tumor", "Solid Tissue Normal")
) # Genome of reference: hg38
GDCdownload(query = query.mut.masked, directory = "maf", method = 'api')
maf_masked <- GDCprepare(query.mut.masked)
Error in GDCprepare(query.mut.masked) :
There are samples duplicated. We will not be able to prepare it
When I View
the query, there are no cases:
> head(getResults(query_masked))
id data_format cases access file_name
1 aa87190b-0149-4d44-b794-cb46bcc2db34 MAF open e4a02e80-f364-4d76-95b8-eea9f4b12bfe.wxs.aliquot_ensemble_masked.maf.gz
2 5289feed-dd60-4b72-8a5d-3b27262ef9e9 MAF open 368ff99a-a718-47c9-a773-7d04adcd6da9.wxs.aliquot_ensemble_masked.maf.gz
3 c1eb9296-d246-4acb-aa80-ccf457afb15a MAF open b90ed2ad-94e0-41c7-9fce-0b3fa4e19848.wxs.aliquot_ensemble_masked.maf.gz
4 7164d0b4-004c-4fa1-8ea8-bd16735a916b MAF open d962ee8d-4777-4ec8-8c4d-c48ce6fcfd8c.wxs.aliquot_ensemble_masked.maf.gz
5 daa16606-e080-4924-a991-20655718b20e MAF open 5255d694-606a-4004-9285-2563c7dc46b5.wxs.aliquot_ensemble_masked.maf.gz
6 9bb584c9-94b1-46cd-ab54-f42de8829984 MAF open facf8f1c-d207-4ff2-971f-dff0e3cc077d.wxs.aliquot_ensemble_masked.maf.gz
When I searched for duplicates with id, it is not giving me any duplicates:
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"id"]))
> dups_index
integer(0)
> dups <- getResults(query.mut.masked)[,"id"][dups_index]
> dups
character(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"file_id"]))
> dups_index
integer(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"analysis_id"]))
> dups_index
integer(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"analysis_submitter_id"]))
> dups_index
integer(0)
How do I remove the duplicate data, also is there any other solution to the error:
Error in GDCprepare(query.mut.masked) :
There are samples duplicated. We will not be able to prepare it