error in getting mutation data
0
0
Entering edit mode
23 months ago
StartR ▴ 30

Hi I am running the following code and get this error:

Error in GDCprepare(query.mut.masked) : 
  There are samples duplicated. We will not be able to prepare it

query.mut.masked <- GDCquery(
     project = "TCGA-BRCA", 
     data.category = "Simple Nucleotide Variation", 
     access = "open", 
     legacy = FALSE, 
     data.type = "Masked Somatic Mutation", 
     workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking",
     # file.type = "maf"
     sample.type = c("Primary Tumor", "Solid Tissue Normal")
) # Genome of reference: hg38

GDCdownload(query = query.mut.masked, directory = "maf", method = 'api')

maf_masked <- GDCprepare(query.mut.masked)

Error in GDCprepare(query.mut.masked) : 
  There are samples duplicated. We will not be able to prepare it

When I View the query, there are no cases:

> head(getResults(query_masked))
                                    id data_format cases access                                                               file_name
1 aa87190b-0149-4d44-b794-cb46bcc2db34         MAF         open e4a02e80-f364-4d76-95b8-eea9f4b12bfe.wxs.aliquot_ensemble_masked.maf.gz
2 5289feed-dd60-4b72-8a5d-3b27262ef9e9         MAF         open 368ff99a-a718-47c9-a773-7d04adcd6da9.wxs.aliquot_ensemble_masked.maf.gz
3 c1eb9296-d246-4acb-aa80-ccf457afb15a         MAF         open b90ed2ad-94e0-41c7-9fce-0b3fa4e19848.wxs.aliquot_ensemble_masked.maf.gz
4 7164d0b4-004c-4fa1-8ea8-bd16735a916b         MAF         open d962ee8d-4777-4ec8-8c4d-c48ce6fcfd8c.wxs.aliquot_ensemble_masked.maf.gz
5 daa16606-e080-4924-a991-20655718b20e         MAF         open 5255d694-606a-4004-9285-2563c7dc46b5.wxs.aliquot_ensemble_masked.maf.gz
6 9bb584c9-94b1-46cd-ab54-f42de8829984         MAF         open facf8f1c-d207-4ff2-971f-dff0e3cc077d.wxs.aliquot_ensemble_masked.maf.gz

When I searched for duplicates with id, it is not giving me any duplicates:

> dups_index <- which(duplicated(getResults(query.mut.masked)[,"id"]))
> dups_index
integer(0)
> dups <- getResults(query.mut.masked)[,"id"][dups_index]
> dups
character(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"file_id"]))
> dups_index
integer(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"analysis_id"]))
> dups_index
integer(0)
> dups_index <- which(duplicated(getResults(query.mut.masked)[,"analysis_submitter_id"]))
> dups_index
integer(0)

How do I remove the duplicate data, also is there any other solution to the error:

Error in GDCprepare(query.mut.masked) : 
  There are samples duplicated. We will not be able to prepare it
TCGA BRCA Mutation • 502 views
ADD COMMENT

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6