Question

A consistent set of reference files that will work well with each other for WEX sequencing and RNA-seq etc.

0

Entering edit mode

7.6 years ago

danny • 0

We have recently started to do a lot of (human) sequencing of all different kinds and one thing that has been particularly confusing is trying to understand all of the different versions of the different reference data sets (e.g. the reference genome, transcriptome, db_snp, COSMIC, etc.) that are out there, and along with that, finding a set of them that are all consistent with each other with respect to indexing, naming, and content, and thus will play nice when put into the same algorithm. Currently, I think that I can get a matching genome and transcriptome, etc. from ENSEMBL - and in particular, GRCh37.75, but I do not know which version of COSMIC, dnsnp, or other mutation lists (indels, etc.) should be paired with this. Thanks for the help.

(Note: other on this site have suggested delaying the move to GRCh38 - is that also agreed to be a good move?)

sequencing RNA-Seq fasta references • 1.3k views

ADD COMMENT • link updated 7.6 years ago by harold.smith.tarheel ★ 4.9k • written 7.6 years ago by danny • 0

score 0 · Answer 1 · 2016-09-15

Often databases do not use the same reference. Even if they use the same reference, sometimes they use different versions of the same reference. Sometimes the difference may be as minor as different chromosome order, but some tools will complain about that. You will have to convert between different genome builds no matter which one you select.

For major organisms, Ensembl has all relevant info for each build. See this summary page: http://useast.ensembl.org/info/data/ftp/index.html

As far as using genome build, GRCh38 has been out for more than a year. Unless you have older data, use the most recent build. You will have to upgrade eventually, because everyone else will. I went through hg18-hg19 transition. It was rough.

score 0 · Answer 2 · 2016-09-15

0

Entering edit mode

7.6 years ago

harold.smith.tarheel ★ 4.9k

Kai Wang has collated a consistent set of reference files (hg19, refGene, dbSNP, 1000Genomes, NHLBI 6500 exome, etc) for his ANNOVAR software with instructions for downloading/building an integrated annotation table.

ADD COMMENT • link 7.6 years ago by harold.smith.tarheel ★ 4.9k