Question: A consistent set of reference files that will work well with each other for WEX sequencing and RNA-seq etc.
gravatar for danny
3.6 years ago by
danny0 wrote:

We have recently started to do a lot of (human) sequencing of all different kinds and one thing that has been particularly confusing is trying to understand all of the different versions of the different reference data sets (e.g. the reference genome, transcriptome, db_snp, COSMIC, etc.) that are out there, and along with that, finding a set of them that are all consistent with each other with respect to indexing, naming, and content, and thus will play nice when put into the same algorithm. Currently, I think that I can get a matching genome and transcriptome, etc. from ENSEMBL - and in particular, GRCh37.75, but I do not know which version of COSMIC, dnsnp, or other mutation lists (indels, etc.) should be paired with this. Thanks for the help.

(Note: other on this site have suggested delaying the move to GRCh38 - is that also agreed to be a good move?)

ADD COMMENTlink modified 3.6 years ago by harold.smith.tarheel4.5k • written 3.6 years ago by danny0
gravatar for igor
3.6 years ago by
United States
igor9.9k wrote:

Often databases do not use the same reference. Even if they use the same reference, sometimes they use different versions of the same reference. Sometimes the difference may be as minor as different chromosome order, but some tools will complain about that. You will have to convert between different genome builds no matter which one you select.

For major organisms, Ensembl has all relevant info for each build. See this summary page:

As far as using genome build, GRCh38 has been out for more than a year. Unless you have older data, use the most recent build. You will have to upgrade eventually, because everyone else will. I went through hg18-hg19 transition. It was rough.

ADD COMMENTlink written 3.6 years ago by igor9.9k
gravatar for harold.smith.tarheel
3.6 years ago by
United States
harold.smith.tarheel4.5k wrote:

Kai Wang has collated a consistent set of reference files (hg19, refGene, dbSNP, 1000Genomes, NHLBI 6500 exome, etc) for his ANNOVAR software with instructions for downloading/building an integrated annotation table.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by harold.smith.tarheel4.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2181 users visited in the last hour