Indexing large reference genome
0
1
Entering edit mode
6.2 years ago

Hello everyone, I'm a newbie in bioinformatics an I'm facing up some problems. I want to simulate a ddRAD sequencing and all the analyses involved. So, I have downloaded a reference genome ad used it to do some simulations of the digestions and the Illumina sequencing. Now, I want to see which one of my simulations is the best for my study purpose and I need to map these results back to the reference I downloaded. This reference is almost a 6.5 GB genome and I've tried to index it inside the pipeline of ipyrad (I want to use this to the downstream analyses), bwa, smalt bowtie and none of them work.

I work on a MacBook Air that is quite limited in RAM but I have let the script work for more than 12 without progress: ipyrad in step 3 continue indexing the reference and after 12 hr it was still at 0%; bwa stops at 1000 iterations; smalt stops at "Setting the k-tuple positions in index"; bowtie stops at "Building DifferenceCoverSample Building sPrime Building sPrimeOrder V-Sorting samples".

I build all of these package in a conda environment and I have checked all the dependences.

Now I'm asking, how can I index this reference genome in a suitable way to ipyrad? I have to split it, index, and than marge the indexes together? If yes, how? I have searched on line but I cannot find an answer.

Thanks for your help.

Assembly genome • 1.5k views
ADD COMMENT
0
Entering edit mode

MacBook Air that is quite limited in RAM

When RAM becomes a limitation there is really no other alternative but to find alternate hardware/resources that allow access to more. You should definitely consider doing that.

I assume you have already seen this for bowtie1 and bowtie2.

ADD REPLY
0
Entering edit mode

I assume you have already seen this for bowtie1 and bowtie2.

I didn't, thanks for the advice. As for bowtie, with these options it gives me error. For bowtie2 it begins to run but it doesn't seem to work. I know that RAM is a limitation for me but Mac can use virtual memory to run, that is slower but works. It'i impossible to expand RAM in MacBookAir, and I am not into buy (or setting completely from the beginning) a new pc to try to index a genome if I am not completely sure that is memory the problem, and not the file dimensions.

So, I don't know what to do but thanks a lot for your advice ^.^

ADD REPLY

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6