I'm sure this is something incredibly basic - still, I haven't been able to find an answer from by digging across youtube tutorials and various forums.
I'm keen to use Galaxy to analyze RADseq data and the tutorials all make sense, with the exception of the nuances of getting my custom reference genome into the program (I understand I need a FASTA for an FTP upload of the genome). My question is really related to the sources of these genomes and what the various file designations mean and what / how I need to manipulate them to end up with the cohesive FASTA to import into Galaxy.
I'm working with long-tailed macaques and my genome options are:
http://www.ebi.ac.uk/ena/data/view/PRJEB7871 - under reads there are links that put either the fastq or submitted (cleaned?) files into Galaxy directly - but then I'm unclear if those files can be directly used as a custom reference genome, or if and what further processing needs to be done. What I gathered was these were raw http://gbe.oxfordjournals.org/content/7/3/821.full based on the data deposit statement within the article
http://www.ncbi.nlm.nih.gov/assembly/GCF_000364345.1 Genbank has assembled reads and I can download a FASTA for each individual chromosome - but from there I was unclear if I should simply concatenate all the FASTAs from each chromosome into a single plain text/FASTA file and whether I should be dealing with the unplaced reads that are also accessible.
Any clarification and opinions on this matter would be appreciated. I think I'm missing something in terms of terminology as well - is there a specific phrase or indicator used to designate a full genome FASTA? I previously worked with a full FASTA from this genome http://gigadb.org/dataset/100003 but the quality for this species has since increased with other additions.