How to select a gene annotation file in ensembl that contain un-localized scaffolds, but NO patches or haplotypes?
1
0
Entering edit mode
5.8 years ago
salamandra ▴ 550

To use STAR alignment tool, it is recommended that the annotation file includes un-placed and un-localized scaffolds while excluding patches and alternative haplotypes (in page 5 of http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf).

Ensembl annotation seems to have a file 'Homo_sapiens.GRCh38.92.chr_patch_hapl_scaff.gtf.gz' containing both un-placed scaffolds and patches (in here). But neened with unplaced scaffolds and NO patches. Which of these files is that? Or can it be found somewhere else?

And concerning the genome sequence file: Does the primary assembly include un-placed scaffolds?

assembly RNA-Seq STAR genome • 2.7k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
salamandra ▴ 550

Here goes the answer I got from 'Ensembl help' to this and related questions:

1- From what I understood 'Homo_sapiens.GRCh38.dna.toplevel.fa.gz' genome file contains patches and haplotypes and 'Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz' contains just chromossomes WITHOUT un-placed and un-localised scaffolds. I want to download the genome file CONTAINING chromossomes as well as un-placed and un-localised scaffolds, but WITHOUT patches and alternative haplotypes. Which file from these ones (ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/) should I select?

Ensembl: The file 'primary assembly' contains the sequences for all of the chromosomes and unplaced scaffolds, but excludes the patches and haplotypes. You can choose whether you want these files masked (sm or rm) or you can use the Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz for the unmasked version.

2- I want to download the corresponding annotation file (chromossomes with un-placed scaffolds but NO patches/haplotypes), which file from these (ftp://ftp.ensembl.org/pub/release-92/gtf/homo_sapiens) should I choose?

Ensembl: For the GTF files you would want the one without an extension i.e. Homo_sapiens.GRCh38.92.gtf .

3 - What's the difference between 'Homo_sapiens.GRCh38.92.chr.gtf.gz' and 'Homo_sapiens.GRCh38.92.chr_patch_hapl_scaff.gtf.gz'?

Ensembl: The difference is that the chr one contains only the chromosomes and the chr_patch_hapl_scaff contains everything. The latter one therefore is most similar to the 'top level' of the former. The chr is not really represented in the fasta file folder.

ADD COMMENT

Login before adding your answer.

Traffic: 2740 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6