Question: What type of file is the size of the chromosomes and how do I get it?
0
gravatar for kiomix106
13 months ago by
kiomix10610
kiomix10610 wrote:

I am currently using files gff, gff3, gtf ... I work with commands by terminal ... mainly awk and tools like bedtools

assembly genome • 605 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds30k • written 13 months ago by kiomix10610

One of the solutions here should suffice: Easiest Way To Obtain Chromosome Length?

Are you referring to your own assembled genome or one of the pre-existing genomes out there?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax83k

my question is if I can get the size of a chromosome from a notation file or a gff or gff3 or gtf? or just from a page that has the information about any genome?

ADD REPLYlink written 13 months ago by kiomix10610
1

You can't get size of a chromosome from a GFF v.1 or 2 (amended based on @jrj.healey's point below)/GTF file. There is no provision in the two file formats to encode information about chromosome size.You may be able to get an approximation (f you consider the chromosome to start at base 1 and use the end interval base pair of last feature that is encoded for that chromosome).

I don't know what you mean by a notation file. Can you clarify?

build.chrome.Sizes file available from UCSC genome data download folders will have chromosome sizes. Example file for GRCh38 human build.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax83k

Not necessarily the case. GFF(3) can contain a (multi)fasta attached to the end of the file after the ## FASTA line. If these were complete chromosomes, you could theoretically get that information from a GFF, but it wouldn’t be especially easy.

ADD REPLYlink modified 13 months ago • written 13 months ago by Joe16k
1

Thanks for clarifying that.

Even if the file is in GFF3 format containing full chromosome sequences, the information about chromosome sizes would not be readily available for direct parsing without additional processing.

ADD REPLYlink written 13 months ago by genomax83k
1
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Use the UCSC Kent Utilities toolkit. For example:

$ fetchChromSizes hg38 > hg38.chromsizes

Or to build a sorted BED file without non-nuclear chromosomes:

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; }' \
    | egrep -v '_' \
    | sort-bed - \
    > hg38.bed

Whether you use a chromsizes or BED or other formatted file depends on what you're doing with it, but a little taco-bell programming can get it into the form you need.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1635 users visited in the last hour