I am using 1000 Genomes data with my new project. When I am inspecting the reference assembly they have been using, I found it contains a "decoy" contig.
The 1000 Genomes FAQ says:
For the final round of alignments the sequence data will be mapped to a set of sequences derived from the GRCh37 assembly. This GRCh37-derived alignment set includes chromosomal plus unlocalized and unplaced contigs, the rCRS mitochondrial sequence (AC:NC_012920), Human herpesvirus 4 type 1 (AC:NC_007605) and decoy sequence derived from HuRef, Human Bac and Fosmid clones and NA12878. These files are available in phase2_reference_assembly_sequence on the ftp site. All human variant coordinates reported by the 1000 Genomes project are in GRCh37 coordinates.
Here, I have no idea what the decoy sequence is and why it is included. Maybe to detect sample contamination?