this question is conceptually similar to a previous one posted by Pierre Lindenbaum, but adapted to the SOLiD data. in fact, our group is planning to map SOLiD results to hg19 using the propietary software BioScope (free though till now). the problem is that although many of the groups we know that are working on NGS are sticked to hg18, none of them have been able to convince us not to use hg19 since we are focused on human variations and we always want to use the most up to date version of dbSNP (among other DBs). any thoughts on this matter?

by the way, is there anyone here using hg19 on BioScope? the default installation comes with hg18 files only, and surprisingly it doesn't seem to be straightforward to upgrade to hg19. there are other files apart from hg19.fa that anyone can get for instance from the UCSC genome browser, and we haven't been able to find anywhere else. of course LifeTech is "working on it", but I was wondering if any of you may have already solved this issue.

Which files are missing? BioScope can be accessed via command line (SSH) so perhaps you could use that instead of the web interface.

sure, I'm accessing the offline cluster by ssh almost always, but the fact is that I have a concise folder structure from the default BioScope installation for hg18 at etc/files, although only the .fa and the .cmap files are needed for the targeted resequencing module we are using. I took the hg19.2bit file from the UCSC genome browser and convert it to .fa, and generated the .cmap following the instructions I found in a cmap folder of the default installation, but things did not work. I was wondering if any other researcher may have solved this, and where could I download those files from.

It is possible to align to hg19 using Bioscope.

Bioscope wants a multi-fasta file, and a cmap file pointing to the per-chromosome files. You'll also need a dbSNP source compatible with hg19 for annotation of SNP calls.

The 2bit file you converted may not be in a properly line wrapped fasta format. Make sure the file is compatible with samtools faidx first.

You should create a working multi-fasta reference by fetching the per-chromosome files from UCSC or NCBI, and concatenating them in a sensible (non-strictly-alphabetical) order. See: Where Can I Download Human Reference Genome In Fasta Format? Hgref.Fa File

You can construct a cmap file for small indel calling by looking at the existing file and updating it to hg19 file locations. (A cmap file is just a lookup table for the per-chromosome files.) Include (or exclude) the random contigs to match your multi-fasta reference.

For SNP annotation, you'll need to fetch and uncompress 3 files from the NCBI ftp folder here

• b132_SNPChrPosOnRef_37_1.bcp.gz
• b132_SNPContigLocusId_37_1.bcp.gz
• b132_SNPContigLoc_37_1.bcp.gz

Install these to a hg19/dbSNP folder, and update the annotation parameters accordingly.

Some of the other modules will not function (CNV needs mappability computed for the new genome) - these will have to wait until LifeTech releases support for hg19. For targeted resequencing, you should be fine with the files listed. (Make sure your targets have hg19 coordinates, too!)

found out that the memory requirements for bioscope using hg19 and dbSNP132 are 24GB instead of the 16GB stated when we bought the cluster nodes half a year ago :(

thanks jmanning for such a concise answer. I'm testing all this right away!

thanks jmanning for such a concise answer. we've just received today an updated version of the BioScope draft manual which now has a "add annotations" section where actually describes a very similar process as the one you mention. I'm testing both procedures right away!

