Issues with genome indexing with bowtie2-build
1
0
Entering edit mode
19 months ago
Elisa • 0

Hi everyone,

I'm having some troubles with the indexing of the reference genome (GRCh38) with 'bowtie2-build':

                                bowtie2-build ReferenceGenome GRCh38_index --large-index


The indexes I've build (by running this cmd) are much more smaller than the genome (GRCh38) indexes that can be downloaded from Bowtie2 website. How is it possible ? What is wrong?

Thanks in advance

bowtie2 • 1.2k views
ADD COMMENT
0
Entering edit mode

Why you are not using the pre-build bowtie2 index, is there any specific reason?

bowtie2 large index option

About the genome size and index size, what do you mean much more smaller? there should be 6 files for the index. Do you mean all the files are smaller in size?

I would suggest to re-check the reference genome file, whether it is truncated or what? Another thing you can do is just run the alignment and check the difference between your index and the bowtie2 pre-built index.

ADD REPLY
0
Entering edit mode

I have to use the indexes build with bowtie2-build for research reason.

The 6 files that I've build are all smaller in size than the pre-build bowtie2 index.

The size of my files is respectively: 60.8 MB, 78.6 MB, 18.1 kB, 39.3 MB, 60.8 MB, 78.6 MB. The size of the pre-build bowtie2 indexes is respectively: 982.5 MB, 733.7 MB, 10.9 kB, 733.7 MB, 982.5 MB, 733.7 MB.

The reference genome file seems to be ok (it is not truncated).

When I perform the alignment with the pre-build indexes the overall alignment rate is very high (98%), while when I perform the alignment with the indexes I've build the overall alignment rate is very low (35%). Which can be the problem?

Thank you for your reply.

ADD REPLY
0
Entering edit mode

I think there is a problem with your reference genome. Can you check/compare the chromosome sizes of the references (from the bwotie2 index that you build and one obtained from the website/pre-built index)? You can obtain this information from the header of the alignment Link.

Where did you download the reference genome and what is the file size?

ADD REPLY
0
Entering edit mode

I've downloaded 'Homo_sapiens.GRCh38.dna.alt.fa.gz' from Ensembl and the file size is 56.8 GB. Where should I download the reference genome ?

When I try to run bowtie2-inspect on the indexes I've build I get the following error: Could not locate a Bowtie index corresponding to basename "GrCh38_index".

ADD REPLY
0
Entering edit mode
19 months ago
GenoMax 125k

I've downloaded 'Homo_sapiens.GRCh38.dna.alt.fa.gz' from Ensembl and the file size is 56.8 GB

Please use the primary assembly. See answer here for differences between the two types : WTF with the ensembl human genome?

ADD COMMENT
0
Entering edit mode

I've downloaded it and already tried to build the indexes but there is a problem: only the first 4 indexes are generated, why?

ADD REPLY
0
Entering edit mode

bowtie2 always creates 4 files ".1.bt2", ".2.bt2", ".3.bt2", ".4.bt2" and 2 ".rev.1.bt2", ".rev.2.bt2". The number of generated files from bowtie2-build is not dependent on the number of sequences in the assembly.

ADD REPLY

Login before adding your answer.

Traffic: 1246 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6