Question: What type of file is the size of the chromosomes and how do I get it?
0
gravatar for kiomix106
22 months ago by
kiomix10610
kiomix10610 wrote:

I am currently using files gff, gff3, gtf ... I work with commands by terminal ... mainly awk and tools like bedtools

assembly genome • 1.1k views
ADD COMMENTlink modified 21 months ago by Alex Reynolds31k • written 22 months ago by kiomix10610

One of the solutions here should suffice: Easiest Way To Obtain Chromosome Length?

Are you referring to your own assembled genome or one of the pre-existing genomes out there?

ADD REPLYlink modified 22 months ago • written 22 months ago by GenoMax96k

my question is if I can get the size of a chromosome from a notation file or a gff or gff3 or gtf? or just from a page that has the information about any genome?

ADD REPLYlink written 22 months ago by kiomix10610
1

You can't get size of a chromosome from a GFF v.1 or 2 (amended based on @jrj.healey's point below)/GTF file. There is no provision in the two file formats to encode information about chromosome size.You may be able to get an approximation (f you consider the chromosome to start at base 1 and use the end interval base pair of last feature that is encoded for that chromosome).

I don't know what you mean by a notation file. Can you clarify?

build.chrome.Sizes file available from UCSC genome data download folders will have chromosome sizes. Example file for GRCh38 human build.

ADD REPLYlink modified 22 months ago • written 22 months ago by GenoMax96k

Not necessarily the case. GFF(3) can contain a (multi)fasta attached to the end of the file after the ## FASTA line. If these were complete chromosomes, you could theoretically get that information from a GFF, but it wouldn’t be especially easy.

ADD REPLYlink modified 22 months ago • written 22 months ago by Joe19k
1

Thanks for clarifying that.

Even if the file is in GFF3 format containing full chromosome sequences, the information about chromosome sizes would not be readily available for direct parsing without additional processing.

ADD REPLYlink written 22 months ago by GenoMax96k
1
gravatar for Alex Reynolds
21 months ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Use the UCSC Kent Utilities toolkit. For example:

$ fetchChromSizes hg38 > hg38.chromsizes

Or to build a sorted BED file without non-nuclear chromosomes:

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; }' \
    | egrep -v '_' \
    | sort-bed - \
    > hg38.bed

Whether you use a chromsizes or BED or other formatted file depends on what you're doing with it, but a little taco-bell programming can get it into the form you need.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour
_