Problem creating a reference genome
0
0
Entering edit mode
9 weeks ago

Hi, I need to generate a custom reference genome (i.e. reference sequence) for a S.cerevisiae strain I use in the lab, which has some polymorphisms. I´ve done the alignments (bowtie, -v 0) using the genome present in SGDatabase as my reference genome and given the polymorphisms, many reads are discarded when they shouldn't.

So far, I have obtained a .bedgraph file that allows me know the coverage in each region. However, I think the value would be higher using a custom reference genome.

Any idea on how to create my custom reference genome? Is it necessary to first call polymorphisms first? If so, how could this be done?

Thanks!!

reference custom genome • 305 views
0
Entering edit mode

Are you referring to SNPs? Also using bowtie v.1.x is going to use ungapped alignments that could be one reason why you are not getting good alignments. So try replacing that with bwa mem or a similar aligner. You have the option of doing a reference guided/de novo assembly that should help account for presence of SNPs.

0
Entering edit mode

Thank you very much! I have tried what you suggested and I now I have a de novo assembly. Could you please explain me how to continue to get a fasta file that could be used as a referece genome? I have just started with bioinformatics and still don't really understand all the process.

0
Entering edit mode

Your denovo assembly should be a fasta file that can be used as a reference genome however whether you want to do that will depend on what analysis you want to do and the quality of this assembly

Alternatively you may want to use another high quality assembly for cerevisiae for another strain closer to yours. Yue et al 2017 has several pacbio assemblies of diverse clades

0
Entering edit mode

In case you chose the alignment to a reference using bwa route you should have an alignment file. You can use the instructions here to call a consensus reference: Generating consensus sequence from bam file

0
Entering edit mode

Thank you very much! The post you suggested is exactly what I need. However, I'm having trouble when indexing my reference genome as when performing mpileup I get a message like this "[fai_fetch_seq] The sequence "CHRII" not found". Any ideas of what can be happening?