Question

Indexing large reference genome

1

Entering edit mode

6.2 years ago

alice.chiodi01 ▴ 10

Hello everyone, I'm a newbie in bioinformatics an I'm facing up some problems. I want to simulate a ddRAD sequencing and all the analyses involved. So, I have downloaded a reference genome ad used it to do some simulations of the digestions and the Illumina sequencing. Now, I want to see which one of my simulations is the best for my study purpose and I need to map these results back to the reference I downloaded. This reference is almost a 6.5 GB genome and I've tried to index it inside the pipeline of ipyrad (I want to use this to the downstream analyses), bwa, smalt bowtie and none of them work.

I work on a MacBook Air that is quite limited in RAM but I have let the script work for more than 12 without progress: ipyrad in step 3 continue indexing the reference and after 12 hr it was still at 0%; bwa stops at 1000 iterations; smalt stops at "Setting the k-tuple positions in index"; bowtie stops at "Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples".

I build all of these package in a conda environment and I have checked all the dependences.

Now I'm asking, how can I index this reference genome in a suitable way to ipyrad? I have to split it, index, and than marge the indexes together? If yes, how? I have searched on line but I cannot find an answer.

Thanks for your help.

Assembly genome • 1.5k views

ADD COMMENT • link 6.2 years ago by alice.chiodi01 ▴ 10

0

Entering edit mode

MacBook Air that is quite limited in RAM

When RAM becomes a limitation there is really no other alternative but to find alternate hardware/resources that allow access to more. You should definitely consider doing that.

I assume you have already seen this for bowtie1 and bowtie2.

ADD REPLY • link 6.2 years ago by GenoMax 141k

0

Entering edit mode

I assume you have already seen this for bowtie1 and bowtie2.

I didn't, thanks for the advice. As for bowtie, with these options it gives me error. For bowtie2 it begins to run but it doesn't seem to work. I know that RAM is a limitation for me but Mac can use virtual memory to run, that is slower but works. It'i impossible to expand RAM in MacBookAir, and I am not into buy (or setting completely from the beginning) a new pc to try to index a genome if I am not completely sure that is memory the problem, and not the file dimensions.

So, I don't know what to do but thanks a lot for your advice ^.^

ADD REPLY • link 6.2 years ago by alice.chiodi01 ▴ 10