download fastq file from geo/sra for negative control
1
0
Entering edit mode
9.2 years ago
ginlucks ▴ 20

Hi all,

I have some questions before starting an hard work (at least for me ... I'm moving first steps), I would know if is it possible download an entire fastq file from GEO or SRA or are them just some sequences deriving from a bigger file?

If they are entire files, is it possible to use them as healt donors? and how can I know if are they healthy donors?

Any link to suggest?

thanks in advance for answers

sra fastq ngs geo exome-sequencing • 4.7k views
ADD COMMENT
2
Entering edit mode
9.2 years ago
TriS ★ 4.7k

ginlucks, thanks for re-posting!

In the link that you had on the previous post you could see under "sample" the following description:

Sample: SAMN03066918 Non-tumor DNA sample from Blood of a human participant in the dbGaP study

which means that that sample is from a healthy donor.

Let me make a small point too...blood is either from a healthy donor or from someone with a disease. there is no matched healthy/not healthy = from the same patient since blood is everywhere (unless the first sample was taken before disease onset). in the case of solid tumors instead you could have matched tumor-normal....but anyway, the link you shared mentions that that's a healthy donor.

Also, those are all the reads. if you look at the size (9.3Gb) that's a good indicator that is not a subset.

You can then get fastq from sra using fastq-dump as explained here

Hope this helps

ADD COMMENT
0
Entering edit mode

Thank you for answer.

Yes it is a good suggestion, but the cancer might be driven by genetic predisposition, such as a recessive mutation on single allele on APC or other genes, and that might alters results. I think it is better use healthy donors as negative control, however, if I could not find them it is a nice "plan B".

Returning to FastQ file, I cant download any file, if I use fastq-dump command terminal give me this error:

fastq-dump.2.3.4 err: item not found while constructing within virtual database module - the path 'DRR001291' cannot be opened as database or table

If I use this command: prefetch -v

terminal show me this error message

Maximum file size download limit is 20,971,520KB
2015-02-25T09:23:15 prefetch.2.3.4 err: path not found while resolving tree within virtual file system module - 'DRR001291' cannot be found.
ADD REPLY
1
Entering edit mode

yes but to check that you need to go and look at the sample's data . to do that you can look at the info available for the sample that you are looking for and you can, for example, find them here: http://www.ncbi.nlm.nih.gov/gap/?term=2[s_discriminator]%20AND%20phs000798&report=SVariables

There you will have access to various information regarding this subject.

Anyway...for the file looks like it's not able to find the file you are looking for. try giving it the whole path instead of the name.

Otherwise try wget:

wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/DRR/DRR001/DRR001291/DRR001291.sra

...the path is the file you were trying to download

ADD REPLY
0
Entering edit mode

Using wget command it works! thank you.

But how to find path of other sra file?

EDIT:

Found, just click on size of the file

ADD REPLY

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6