Question

Forum:Ensembl file formatting tool: tell us what you need

3

Entering edit mode

8.2 years ago

Emily 23k

We are looking for feedback on a new Ensembl tool being developed to help researchers download the reference files they need in the right format directly from Ensembl.

We understand there's slightly different formatting needed by different tools, or even sometimes you need identifiers remapped to make datasets match. An example of that would be EMBL chromosome names (1, 2, 3...) and UCSC chromosome names (chr1, chr2, chr3...). For some analyses N padding in a chromosome, for others it might cause issues.

So we're creating a tool that can help give you the datasets you need, in the format you need, so you can spend less time preparing the reference sets and get down to running your analysis. For example, NCBI has a number of premade datasets, with different combinations of regions, and with prepared indexes for common tools:

http://ftp.ncbi.nlm.nih.gov/genomes/genbank/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids

The first step in this project is we want to hear from you on what filtering and transformations you do to our datasets to make them useful for your analysis. Or what changes to our datasets would make them easier for you to run your analysis faster. Everything from identifier types, to extra attributes needed, what combinations of regions in a reference set (patches, haplotypes, scaffolds, etc) to masking and filtering of regions.

Once we have a list of how our users use our data, and what programs they're trying to use it with, we can start this initiative to make our datasets and tools better adapted to your needs. We also hope we'll be able to follow up with anyone replying in case we need some clarification to better understand your needs.

Thank you to everyone, we're committed to making our reference data better fit your analysis needs.

Ensembl format dataset file-conversion • 1.8k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Emily 23k

0

Entering edit mode

Do you want people replying here or do you have a central email address/site that comments should be sent to?