The Human Genome Project, which aims to determine the complete sequence (sequencing) of the human genome, has been implemented by scientists from all over the world for several decades. The first results of this project were obtained back in 2000, and the "complete" sequence was obtained in 2003. However, this "complete" sequence included 92 percent of all genetic information, the scanning of heterochromatic regions located in the parts of chromosomes called telomeres and centromeres was not performed due to a number of reasons, including technical ones.
Only recently, after two decades of hard work and technical advances, scientists working on the project sequenced nearly three billion base pairs, producing a complete human genome without gaps or white spots. The complete genome, called T2T-CHM13, includes new information on more than 200 million base pairs found in previously unexplored regions of DNA. Scientists have already identified a new 99 genes that are responsible for protein synthesis, and now there are still around 2,000 candidate genes that need to be investigated in more detail to determine their functions. Also present in the new genome are corrections for thousands of structural errors found in earlier data.
In order to obtain a complete genome, scientists have used several state-of-the-art tools that allow them to read data from longer DNA sequences. After all, it is much easier to get the job done by operating on fewer, longer stretches than it is to work on many shorter pieces. The main tool was Oxford Nanopore DNA sequencing technology, which can 'read' up to a million bases at a time with acceptable but not perfect accuracy. Complex and challenging sections which were sequenced by the first technology with many errors were processed using PacBio HiFi technology, which can read up to 20,000 bases at a time with very high precision, excluding the occurrence of errors.
The second part of the genome problem was that the data from the previous version of the 'complete' genome was 'stitched' from DNA data taken from different people, which in itself creates discontinuities and 'white spots' in the data. By creating the latest version of the complete genome, scientists have eliminated all these errors.
The resulting human whole genome data will form the basis for a host of new research, including the identification of genetic markers for various diseases, and the development of new effective treatments for inherited and acquired genetic diseases. And the next steps that scientists intend to take in the Human Genome Project will be to sequence the complete genome of 350 individuals and create a common database with which to identify differences between individuals.