Python Tabix Api - Getnames ?
2
1
Entering edit mode
8.4 years ago

I am very new to python.

I was wondering if the python Tabix API provides a way to get the names of all seqids?

If not is there a way to query without providing a regions and return the whole file?

These calls are used in perl API. I would be shocked if they didn't exist in the python API.

python tabix api • 3.9k views
ADD COMMENT
3
Entering edit mode
8.4 years ago

If you can use pysam, you should be in business.

http://www.cgat.org/~andreas/documentation/pysam/api.html#pysam.Tabixfile

>>> import pysam
>>> tabixfile = pysam.Tabixfile( "/usr/local/share/gemini/data/hg19.CpG.bed.gz" )
>>> print tabixfile.contigs
['chr1', 'chr10', 'chr11', 'chr11_gl000202_random', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr17_ctg5_hap1', 'chr17_gl000204_random', 'chr17_gl000205_random', 'chr18', 'chr19', 'chr1_gl000191_random', 'chr1_gl000192_random', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr4_ctg9_hap1', 'chr4_gl000193_random', 'chr4_gl000194_random', 'chr5', 'chr6', 'chr6_apd_hap1', 'chr6_cox_hap2', 'chr6_dbb_hap3', 'chr6_mann_hap4', 'chr6_mcf_hap5', 'chr6_qbl_hap6', 'chr6_ssto_hap7', 'chr7', 'chr8', 'chr8_gl000197_random', 'chr9', 'chr9_gl000199_random', 'chr9_gl000200_random', 'chr9_gl000201_random', 'chrUn_gl000211', 'chrUn_gl000212', 'chrUn_gl000213', 'chrUn_gl000214', 'chrUn_gl000215', 'chrUn_gl000216', 'chrUn_gl000217', 'chrUn_gl000218', 'chrUn_gl000219', 'chrUn_gl000220', 'chrUn_gl000221', 'chrUn_gl000222', 'chrUn_gl000223', 'chrUn_gl000224', 'chrUn_gl000225', 'chrUn_gl000228', 'chrUn_gl000229', 'chrUn_gl000231', 'chrUn_gl000235', 'chrUn_gl000236', 'chrUn_gl000237', 'chrUn_gl000240', 'chrUn_gl000241', 'chrUn_gl000242', 'chrUn_gl000243', 'chrX', 'chrY']
ADD COMMENT
1
Entering edit mode
8.4 years ago

If this is a question about a tabix indexed GTF, see:

import gzip
gtf = gzip.GzipFile('gtf.gz')
seqids = [line.decode('ascii').split('\t')[0] for line in gtf]
gtf.close()

I know this isn't really tabix-specific, but I think this is what you want.

ADD COMMENT
2
Entering edit mode

Tabix keeps all the reference sequence names in the tabix index. In principle, you can get those by reading index alone.

ADD REPLY
0
Entering edit mode

Can you read the index (tbi) in the python API? I know in perl it is as simple as $tabix_obj->getnames(). We know you are not busy (jk) so maybe you could write that up real quick ;-)?

ADD REPLY
0
Entering edit mode

Theoretically speaking, possible, but I do not know Python well enough to do that... Just use Pysam.

ADD REPLY
0
Entering edit mode

doesn't for line require that the whole file is read?

ADD REPLY
1
Entering edit mode

Yes, and I can see why you would rather read these values from a smaller index file. Just thought it might help!

ADD REPLY
0
Entering edit mode

It is helpful and it is one valid solution.

ADD REPLY

Login before adding your answer.

Traffic: 1288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6