The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual | bioRxiv (www.biorxiv.org)

We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 genes, of which 20,003 are protein coding.

submitted by: Istvan Albert

GitHub - lh3/srf: SRF: Satellite Repeat Finder (github.com)

Satellite Repeat Finder, or SRF in brief, assembles motifs in satellite DNA that are tandemly repeated many times in the genome. It takes short reads, accurate long reads or high-quality contigs as input and reports the consensus of each repeat unit. SRF can identify satellite repeats that are often missed in de novo assembly. For species enriched with high-order repeats (HORs), it tends to find HORs instead of the minimal repeat unit. SRF may also find truly circular genomes such as mitochondial or chloroplastic genomes if their abundance is high.

submitted by: Istvan Albert

GitHub - agshumate/Liftoff: An accurate GFF3/GTF lift over pipeline (github.com)

Liftoff is a tool that accurately maps annotations in GFF or GTF between assemblies of the same, or closely-related species. Unlike current coordinate lift-over tools which require a pre-generated “chain” file as input, Liftoff is a standalone tool that takes two genome assemblies and a reference annotation as input and outputs an annotation of the target genome. Liftoff uses Minimap2 (Li, 2018) to align the gene sequences from a reference genome to the target genome. Rather than aligning whole genomes, aligning only the gene sequences allows genes to be lifted over even if there are many structural differences between the two genomes.

submitted by: Istvan Albert

GitHub - EBIvariation/variant-remapping: The pipeline for remapping VCF variants between two arbitrary FASTA assemblies. (github.com)

Pipeline for remapping VCF variants between two arbitrary assemblies in FASTA format. No chain file is required. However, it does assume that the source and destination genomes are closely related and was designed with the explicit purpose of lifting over variants from one version of the genome to another.

submitted by: Istvan Albert

GIGGLE: a search engine for large-scale integrated genome analysis | Nature Methods (www.nature.com)

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

submitted by: Istvan Albert

