I have dbGaP metadata for a project that has multiple datatypes (WES, WGS, RNA-seq etc) and I am trying to select only samples corresponding to RNA-seq but I am quite confused looking at a few columns in the metadata. Here are the columns:
> plyr::count(dat[,c('Assay_Type_s','analyte_type_s','molecular_data_type_s')]) Assay_Type_s analyte_type_s molecular_data_type_s freq 1 RNA-Seq DNA <not provided> 1 2 RNA-Seq RNA <not provided> 309 3 RNA-Seq RNA miRNA (NGS) 150 4 RNA-Seq RNA RNA Seq (NGS) 324 5 RNA-Seq RNA Targeted Exome (NGS) 32 6 RNA-Seq RNA Whole Exome (NGS) 4 7 RNA-Seq RNA Whole Genome (NGS) 62
Here, as you can see the
Assay Type is RNA-Seq but then you also see
Analyte Type as DNA or RNA and
Molecular Data Type as either miRNA, RNA-Seq, Targeted, Whole Exome and Whole Genome. How is the relationship determined between these columns and which samples would really be from a RNA-seq experiment?