Chrom Sizes for Ensembl Reference
2
2
Entering edit mode
10.0 years ago

Does anyone know where I can get the chrom sizes info for the ensembl reference genome. I'm attempting to do bedToBigBed but need the chrom sizes file and can't seem to find the ensembl version.

EDIT: sorry, should have been more clear on that. I used the igenomes package for Human (Grch37) - To run the Tuexedo pipeline. I got my GTF file of all the merged transcripts out, converted that to BED for use in an web based genome browser I'm working on. The required binary format is bigbed, so I'm using the bedToBigBed utility from UCSC which requires a chrom info file (start points of each chromosome + patches in the reference Fasta sequence) in tab delimited text format.

UPDATE: So, here's how I solved my problem in case anyone else comes across this.... I basically got rid of all the patches in the GTF file. That can be done with awk (I'm sure there are more efficient methods using regex ). The reference genome patches, as far as I can tell, are not utilised by any of the Tuexedo packages and just get in the way when doing file conversions and such in downstream. The solutions below using samtools and using looking in the SAM headers give the chromosonal lengths, but not the patch lengths. Thanks for all the solutions!

chromsizes reference • 8.0k views
ADD COMMENT
2
Entering edit mode

What file format do you need? What species are you using?

ADD REPLY
0
Entering edit mode

sorry, should have been more clear on that. I used the igenomes package https://support.illumina.com/sequencing/sequencing_software/igenome.ilmn for Human (Grch37) - To run the Tuexedo pipeline. I got my GTF file of all the merged transcripts out, converted that to BED for use in an web based genome browser I'm working on. The required binary format is bigbed, so I'm using the bedToBigBed utility from UCSC which requires a chrom info file (start points of each chromosome + patches in the reference Fasta sequence) in tab delimited text format.

ADD REPLY
2
Entering edit mode

If you aligned any reads using that, you can get the sizes from the SAM/BAM header.

ADD REPLY
2
Entering edit mode
10.0 years ago

So, here's how I solved my problem in case anyone else comes across this.... I basically got rid of all the patches in the GTF file. That can be done with awk (I'm sure there are more efficient methods using regex . The reference genome patches, as far as I can tell, are not utilised by any of the Tuexedo packages and just get in the way when doing file conversions and such in downstream. The solutions below using samtools and using looking in the SAM headers give the chromosonal lengths, but not the patch lengths. Thanks for all the solutions!

ADD COMMENT
2
Entering edit mode
10.0 years ago

The simplest way is to get the information from the SAM file header (as Devon Ryan says) or by running samtools faidx on the file and look at the .fai file. This latter file is almost in the right format for your tools.

ADD COMMENT
1
Entering edit mode

shoot, the autolinking does not work with https ... bugfix needed

ADD REPLY
0
Entering edit mode

This is a great idea, I didn't realise that was in samtools.

ADD REPLY

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6