Trouble finding datasets
1
0
Entering edit mode
7 months ago
SHXVRR ▴ 20

Hello,

I am trying to find datasets for a project on HNSCC. I have been using GEO as my main website to find datasets but have not found anything. I am trying to find a dataset about HNSCC, tumor and control, RNA, for the tonsil body part, and then FASTA files. I find it hard to find GEO datasets that are also SRA, which contains fasta files, unlike normal geo datasets with only txt files most of the time. Adding on to the previous sentence, I found numerous geo datasets that fit my bill, but contain no fasta files. I am wondering if you know how I can find SRA datasets better or any other website that has datsets(with Fasta files)?

Thanks

GEO SRA • 520 views
ADD COMMENT
0
Entering edit mode
7 months ago
GenoMax 142k

First of all there are no fasta files with next generation sequencing datasets. You will have fastq files. Secondly you will likely not be able to access original fastq files unless you apply for access via dbGaP because of participant privacy reasons.

You can find these datasets here: https://portal.gdc.cancer.gov/projects/TCGA-HNSC

If you are able to use gene counts etc then some of the files may be available via open access: https://portal.gdc.cancer.gov/repository?facetTab=files&filters=%7B%22content%22%3A%5B%7B%22content%22%3A%7B%22field%22%3A%22cases.project.project_id%22%2C%22value%22%3A%5B%22TCGA-HNSC%22%5D%7D%2C%22op%22%3A%22in%22%7D%5D%2C%22op%22%3A%22and%22%7D&searchTableTab=files

ADD COMMENT
0
Entering edit mode

If I’m trying to do rna-seq, how do you think the pipeline would look if I start with bam files from tcga. Normally I would something like fastqc, STAR, trimmomatic, and then feature counts. With the bam, would I just go straight to featurecounts?

ADD REPLY
0
Entering edit mode

You will still need to apply for access to BAM's. They are not publicly available. If you can use counts then those are publicly available. There are portals like cBioPortal and Xenabrowser that give you access to analyzed TCGA data.

ADD REPLY
0
Entering edit mode

oooh makes sense. My end goal is to find a number of genes associated to that dataset. Would I need to compare it to a control dataset or could I apply it to deseq2 with the counts file and a sample data file? If so, How could I compare it to a control dataset using R?

ADD REPLY

Login before adding your answer.

Traffic: 2114 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6