Question: Locating Hg18 Fastq Files Within Era For Sequence Alignment Practice
1
gravatar for Delinquentme
8.2 years ago by
Delinquentme200
Delinquentme200 wrote:

so this:

ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002814/ERR002814_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002814/ERR002814_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002815/ERR002815_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002815/ERR002815_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002816/ERR002816_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002816/ERR002816_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002817/ERR002817_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002817/ERR002817_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002818/ERR002818_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002818/ERR002818_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002819/ERR002819_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002819/ERR002819_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002820/ERR002820_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002820/ERR002820_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002821/ERR002821_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002821/ERR002821_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002822/ERR002822_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002822/ERR002822_2.fastq.gz    0
ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002823/ERR002823_1.fastq.gz    0    ftp://ftp.era.ebi.ac.uk/vol1/fastq/ERR002/ERR002823/ERR002823_2.fastq.gz    0

is a full listing of the files for running sequence alignment on 'Mouse 17'

Im looking for the same file format ... on ERA ... for HG18 ( human )

i've been googling for about 2 hours now without being able to get much leeway

might anyone know how to use the ERA website in order to locate unaligned FASTQ reads so that i can practice running sequence alignments?

thanks!

sequence snp bowtie • 2.0k views
ADD COMMENTlink modified 8.2 years ago by Sean Davis25k • written 8.2 years ago by Delinquentme200

Sorry, I don't understand what you exactly are looking for. And what is Mus musculus 17? The current mouse assembly is NCBIM37 / mm9 ....

ADD REPLYlink written 8.2 years ago by Bert Overduin3.6k

updated to what it says in the examples i have. Now what I'm actually after is a list ( like this one.. only longer ) for HG18 ( human). Which would be a list of unaligned reads on a human genome, in fastq format .. up on ERA. Hope that clears things up :D

ADD REPLYlink written 8.2 years ago by Delinquentme200

This is commonly the way public paired-end FASTQ datasets, such as those produced by the 1000 Genomes Project, are formatted. Typically these file pairs end in suffixes _1.fastq.gz and _2.fastq.gz. << from the tutorial im working with

ADD REPLYlink written 8.2 years ago by Delinquentme200
1
gravatar for Sean Davis
8.2 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

FASTQ files are not specific for a particular genome build, so there is not really such a thing as "hg18 FASTQ", at least as I think I understand your question.

As for how to get information about FASTQ files generated from human, you can use a query such as this:

http://www.ebi.ac.uk/ena/data/view/Taxon:9606&portal=sra_sample&limit=1000&display=xml

to get an XML file containing the ERP identifiers and then use aspera or ftp to download as you noted in your question. An alternative is to use the SRAdb package to do the appropriate searches. Once you have the SRA identifiers, the fastq files can be downloaded as in your question.

ADD COMMENTlink modified 9 hours ago by RamRS24k • written 8.2 years ago by Sean Davis25k

the FASTQ is a format ( at least thats whats i've gathered) so im looking for the shorts reads ( IE fresh out of the machine) for a human... Though i think i understand where the confusion is coming in: the HG18 is a reference template ... and the FASTQ files wouldnt be for THAT object.. but instead for a freshly sequenced ( or just "un-aligned" ) genome of human origin

ADD REPLYlink written 8.2 years ago by Delinquentme200

I think you have got it.

ADD REPLYlink written 8.1 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour