Question

How to determine RNA-seq samples from dbGaP metadata

0

Entering edit mode

5.7 years ago

komal.rathi ★ 4.1k

Hi everyone,

I have dbGaP metadata for a project that has multiple datatypes (WES, WGS, RNA-seq etc) and I am trying to select only samples corresponding to RNA-seq but I am quite confused looking at a few columns in the metadata. Here are the columns:

> plyr::count(dat[,c('Assay_Type_s','analyte_type_s','molecular_data_type_s')])

  Assay_Type_s analyte_type_s molecular_data_type_s freq
1      RNA-Seq            DNA        <not provided>    1
2      RNA-Seq            RNA        <not provided>  309
3      RNA-Seq            RNA           miRNA (NGS)  150
4      RNA-Seq            RNA         RNA Seq (NGS)  324
5      RNA-Seq            RNA  Targeted Exome (NGS)   32
6      RNA-Seq            RNA     Whole Exome (NGS)    4
7      RNA-Seq            RNA    Whole Genome (NGS)   62

Here, as you can see the Assay Type is RNA-Seq but then you also see Analyte Type as DNA or RNA and Molecular Data Type as either miRNA, RNA-Seq, Targeted, Whole Exome and Whole Genome. How is the relationship determined between these columns and which samples would really be from a RNA-seq experiment?

Thanks

dbgap metadata • 1.6k views

ADD COMMENT • link updated 5.7 years ago by Santosh Anand 5.7k • written 5.7 years ago by komal.rathi ★ 4.1k

score 0 · Answer 1 · 2018-08-15

0

Entering edit mode

5.7 years ago

Santosh Anand 5.7k

My guess is that the Exome related stuffs refer to Exome Capture RNAseq

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4561495/

The key point is "degraded samples" where RNA cant be captured by usual ribo-zero or polyA (see paper above)

This is also evident from the description of data types

https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetMolecularDataTypes.cgi

I have no idea about the 1st sample (DNA), though it could be an error also given the fact that there is only one sample of that kind.

ADD COMMENT • link 5.7 years ago by Santosh Anand 5.7k

0

Entering edit mode

I don't get it. My question is what are those samples with Molecular Data Type != RNA Seq (NGS)? Are those also RNA-sequencing? If not, then why is the Assay Type = RNA-Seq?

ADD REPLY • link 5.7 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Check the paper, you'll get the idea.

"...Unique to capture transcriptomes is an overnight capture reaction (RNA-DNA hybridization) using exon-targeting RNA probes, followed by a washing step, and an additional set of PCR cycles..."

ADD REPLY • link 5.7 years ago by Santosh Anand 5.7k