At the Telomere-to-Telomere consortium, researchers around the world collaborated to sequence the gaps left out by the initial human reference genome
By BRANDON NGUYEN — science@theaggie.org
The Telomere-to-Telomere (T2T) consortium, an international group of researchers focused on developing the human reference genome, recently filled in the last 8% of genomic DNA that had been left out in the initial Human Genome Project in 2001.
“Release of the first human genome assembly was a landmark achievement, and after nearly two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced,” the consortium website reads. “However, no one chromosome has yet been finished end to end, and hundreds of gaps persist across the genome. These unresolved regions include segmental duplications, ribosomal rRNA gene arrays, and satellite arrays that harbor unexplored variation of unknown consequence.”
Several UC Davis investigators contributed to the series of papers recently published on the completion of the human reference genome. Dr. Megan Dennis, an assistant professor of biochemistry and molecular medicine at the UC Davis School of Medicine and MIND Institute, explained the human reference genome.
“The human genome has been sequenced since 2001, and the human reference genome represents a single example of a human genomic sequence that the community uses as a reference to be able to compare subsequent sequences of other humans against,” Dennis said. “So we use it to understand genes and proteins producing those gene regulatory elements and so forth. The original reference genome comprises a collection of multiple different individuals in which different parts of their genomes had been sequenced and stitched together.”
Dr. Charles Langley, a distinguished professor of evolution and ecology at the UC Davis College of Biological Sciences, explained the process of filling in the gaps of the original sequence.
“The way we sequence genomes is we break it into pieces and try to read the little pieces and then notice differences in the repeats and pieces and where they overlap so we can put them back together into a whole chromosome,” Langley said. “With this T2T project and molecular technological advancements, we are now reading pieces of DNA that are 50,000 base pairs to up to a million base pairs in one read. So even if the thing is highly repetitive over that million base pairs, you will find one or two differences between another read, and then you can then line up those differences and piece them together.”
Areas left out by the original human reference genome included centromeric regions, which are important for the separation of chromosomes during cell division and ribosomal DNA arrays, which are important for creating ribosome machinery in our cells to make proteins. These were too difficult to sequence at the time due to the large number of repeats, according to Langley. However, they now offer a new basis for scientific studies and potential explanations for genetic variations and defects.
“We were able to find a lot of new genes, and we can start to characterize variation of these genes and do comparisons across species but also actually try to take sequence samples from individuals who have disorders or diseases and see if maybe there’s variation within these genes that hadn’t been queried before,” Dennis said. “So we can get parts of the genome that we hadn’t been able to, so that was really important to us. That took about a year, and it brought in lots and lots of different folks all over the world, so it was over 100 scientists that contributed to completing the human reference genome.”
Written by: Brandon Nguyen — science@theaggie.org