Question: How to use a genome assembly file to extract or find the gene or region I am interested in?
3.8 years ago
Public archived rat genome assembly, FASTA format, although sorted by chromosomes, as can be viewed with IGV program, it looks not annotated. Can I use this genome assembly to compare with to the released rat genome assembly but only focusing on my interested region? I am pretty new with the genome assemblies and deep sequencing datasets, so detailed steps and basic approaches are highly appreciated. :-)


I don't understand very well your question, what are you trying to do? If you want to extract defined region of your genome, you can use Bedtools getfasta  ( You only need to know the chromosome and coordinates of your region of interest and make manually a bed file. You can extract the region in both genomes and then compare the sequences.

OK, I want to compare the newly sequenced rat assembly with the latest rat assembly Rnor_5.0. But I need only MHC region. I know which chromosome and approximately where on chromosome the MHC region is, but with no annotations, I don't know the structure of the newly sequenced rat strain. That is what I try to find out.

3.8 years ago
Extract the MHC region from the annotated genome using bedtools or any script. 

Align the extracted MHC region to newly assembled genome and see if there are any differences like rearrangements.

You could use SyMap for aligning long queries, which inturn runs nucmer, a popular tool for comparing large genomes. SyMap is GUI based and has good visualizations like synteny maps and circos type of plots.

Thank you for your perfect suggestion! I will try it out.:-)

