Question: How to select a gene annotation file in ensembl that contain un-localized scaffolds, but NO patches or haplotypes?
0
gravatar for salamandra
11 months ago by
salamandra220
salamandra220 wrote:

To use STAR alignment tool, it is recommended that the annotation file includes un-placed and un-localized scaffolds while excluding patches and alternative haplotypes (in page 5 of http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf).

Ensembl annotation seems to have a file 'Homo_sapiens.GRCh38.92.chr_patch_hapl_scaff.gtf.gz' containing both un-placed scaffolds and patches (in here). But neened with unplaced scaffolds and NO patches. Which of these files is that? Or can it be found somewhere else?

And concerning the genome sequence file: Does the primary assembly include un-placed scaffolds?

rna-seq star assembly genome • 648 views
ADD COMMENTlink modified 10 months ago • written 11 months ago by salamandra220
1
gravatar for salamandra
10 months ago by
salamandra220
salamandra220 wrote:

Here goes the answer I got from 'Ensembl help' to this and related questions:

1- From what I understood 'Homo_sapiens.GRCh38.dna.toplevel.fa.gz' genome file contains patches and haplotypes and 'Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz' contains just chromossomes WITHOUT un-placed and un-localised scaffolds. I want to download the genome file CONTAINING chromossomes as well as un-placed and un-localised scaffolds, but WITHOUT patches and alternative haplotypes. Which file from these ones (ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/dna/) should I select?

Ensembl: The file 'primary assembly' contains the sequences for all of the chromosomes and unplaced scaffolds, but excludes the patches and haplotypes. You can choose whether you want these files masked (sm or rm) or you can use the Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz for the unmasked version.

2- I want to download the corresponding annotation file (chromossomes with un-placed scaffolds but NO patches/haplotypes), which file from these (ftp://ftp.ensembl.org/pub/release-92/gtf/homo_sapiens) should I choose?

Ensembl: For the GTF files you would want the one without an extension i.e. Homo_sapiens.GRCh38.92.gtf .

3 - What's the difference between 'Homo_sapiens.GRCh38.92.chr.gtf.gz' and 'Homo_sapiens.GRCh38.92.chr_patch_hapl_scaff.gtf.gz'?

Ensembl: The difference is that the chr one contains only the chromosomes and the chr_patch_hapl_scaff contains everything. The latter one therefore is most similar to the 'top level' of the former. The chr is not really represented in the fasta file folder.

ADD COMMENTlink written 10 months ago by salamandra220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2150 users visited in the last hour