Are The Pre-Built Reference Sequences From Cbcb Known Transcripts, Or The Entire Genome?
1
0
Entering edit mode
12.4 years ago
Dave Bridges ★ 1.4k

I am mapping reads to a pre-built ebwt file from the Bowtie site (specifically here ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg19.ebwt.zip). Is this reference the entire genome, or just known coding transcripts? Whichever one it is, what is the best way to obtain the other?

next-gen sequencing reference bowtie read short aligner • 2.1k views
ADD COMMENT
1
Entering edit mode
12.4 years ago

You can check what a bowtie index contains with the bowtie-inspect utility that comes with bowtie

bowtie-inspect -n hg19

I haven't checked your example since I build my own indices, but I am fairly sure that the file you link to contains the index of the whole genome! Coding transcripts are less well defined and would have been labeled/annotated in more detail rather than just a build id.

Answer to your second question is to build your own.

ADD COMMENT
0
Entering edit mode

so to build my own i could download fasta formatted refseq human rna collection (ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.rna.fna.gz) and bowtie-build that?

ADD REPLY
0
Entering edit mode

for what its worth the results of the bowtie-insect -n hg19 command for that index are a list of chr1-22/X/Y/M

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6