Incongruence between Scientific Facts and Observations — Please Mind the Gap
By Daniel Tarade
Watching the lantern dim, starved of oxygen
So give me your hand and let's jump out the window
Australia, The Shins
There is a disconnect between what scientists say they know and what they actually observe. This distortion is amplified as findings are iteratively reported, each time in less specialized publications. A telephone rings. Original findings are reported in peer-reviewed research articles. From the beginning, there are already gaps between conclusions and observations. Occasionally, these gaps are acknowledged explicitly by the authors and future experiments are described that will fill-in these gaps. Other times, gaps remain implicit and may be lamented by other scientists as a case of over-interpretation. Review articles are published in scientific journals to summarize and discuss emergent research. One step removed from the original article, conclusions are discussed and, often, little emphasis is given to the methods by which such conclusions were drawn. This effect can only be further amplified when findings are discussed in main stream media. A combination of too few scientific reporters combined with a penchant for sensationalism results in dramatic headlines that are both descriptive and unambiguous. Scrolling text; scientist finds blank to be true. I am not going to comment on whether Truth exists or whether it is accessible to scientists. In a more descriptive sense I wish to illuminate the gaps that exist between the conclusion of a scientific paper and the observations that are provided as support. It only seems fair to apply this stringency to a research article I recently published.[i]
“HIF-2α-pVHL complex reveals broad genotype- phenotype correlations in HIF-2α-driven disease.” That is the title. Lets unpack. HIF-2α is a protein. Proteins are among the functional products of genes, which encode who we are. HIF stands for hypoxia inducible factor. Hypoxia means low oxygen. So, HIF-2α is a protein that becomes activated, or induced, under conditions of low oxygen. pVHL is another protein. pVHL stands for von Hippel-Lindau protein. von Hippel and Lindau were two physicians who described a hereditary cancer syndrome (VHL Disease) over 100 years ago. About 30 years ago, geneticists found that mutations in the gene encoding pVHL are the cause of VHL Disease. The function of pVHL in a human cell is to degrade HIF-2α when there is adequate oxygen available. HIF-2α-pVHL complex refers implicitly to us having identified the structure of HIF-2α-pVHL. Simply put, we know what the interaction between HIF-2α and pVHL looks like at the atomic level. Well, that is what we say anyway. Genotype-phenotype correlations refers to the correlation of a specific genetic sequence (genotype) with a particular clinical presentation (phenotype). HIF-2α-driven diseases are the phenotypes that emerge when the gene encoding HIF-2α is mutated. There are two major disease classes associated with HIF-2α; polycythemia is a condition of elevated red blood cell counts and pheochromocytoma/paraganglioma (PPGL) are two related neuroendocrine tumour types. The genetics linking HIF-2α mutations to both polycythemia and PPGL is strong and it has been noticed that there is no overlap in the types of mutations that cause one disease or the other. However, the outstanding question was what are the differences between the mutations that cause polycythemia and those that cause PPGL. Putting everything together, the title of our manuscript, and by extension, our major conclusion, is that we have uncovered the difference between the HIF-2α mutations that cause PPGL and those that cause polycythemia by exploring how the interaction between pVHL and HIF-2α looks like. Is that a fair statement?
First, I should comment that the published title was revised at the request of peer reviewers and shortened to meet the requirements of the journal. The original title was "Crystal structure of HIF-2α peptide bound to pVHL-EloB-EloC complex reveals a structural basis for genotype-phenotype correlations in HIF-2α-driven disease." The reviewers did not feel that we uncovered the basis for all genotype-phenotype correlations in HIF-2α-driven disease. We agreed. Thus we amended the title to "...broad genotype-phenotype correlations." We concluded the basis for the genotype-phenotype correlation between PPGL-causing and polycythemia-causing HIF-2α mutations but could not say much about other, rarer clinical presentations associated with HIF-2α mutations, like somatostatinoma, a tumour occurring in the duodenum of the small intestine. In addition to highlighting the more general nature of our discovery, we had to shorten title. Our original title explicitly highlighted the technique we employed to study how HIF-2α and pVHL look when they interact: crystallography. Proteins are small structures and cannot be visualized at an atomic-resolution via microscopy, atomic resolution simply meaning that you can make out the individual atoms of a protein. Instead, a powerful and oft-used technique involves making a crystal out of your protein of interest. Any crystal, such as table salt or a diamond, are structures comprised of identical subunits that repeat indefinitely and are related to each other by a set of discrete translations. When a crystal is shot with X-rays, the subunits within the crystal will diffract the light to produce a pattern on X-ray film that can be used to determine the structure of subunit itself. A powerful technique, akin to an atomic microscope, but what are the drawbacks? Well, to crystallize a protein, you need large quantities of immaculately pure protein. The most convenient method of producing large quantities of a pure protein involves programming bacteria to churn out your protein of interest. These microbial factories can produce milligram quantities of protein, which may not sound like much, but is plenty for structural biology. However, by using a bacterial system to express a human protein, there is the risk that the protein will not be of the same quality that it would in its native environment. These risks are amplified when you introduce your protein to the conditions required for crystallization. In a cellular context, proteins do not form a crystal lattice. The abnormal contacts between protein subunits in a crystal contact could conceivably alter the arrangement of atoms in that protein much like how an animal might behave differently in captivity. You can learn about a protein's structure in a crystalline environment much like you can learn about the tiger in a zoo but whether that is the most appropriate study design depends on the question.
For our study, we felt comfortable with the quality of protein (i.e. pVHL-eloB-eloC; elongin B and elongin C are two additional proteins that interact with pVHL to form a complex) being produced by the bacteria. The reason why we felt comfortable was that pVHL, elongin B, and elongin C formed a complex with one another just as they would in a cellular context. Further, the pVHL complex could bind to HIF-2α, again suggesting a functional pVHL complex. But while we are on the topic of HIF-2α, there are other gaps to enumerate. Rather than purify HIF-2α from bacteria, like we did with pVHL complex, we instead worked with synthetic HIF-2α peptide. Meaning, rather than work will the full-length protein, we used a short stretch of HIF-2α that retains the ability to bind pVHL. Generally speaking, proteins are quite modular, with discrete sections of a protein possessing discrete functions. The domain of HIF-2α that interacts with pVHL complex is known as the oxygen dependent degradation (ODD) domain because the interaction between pVHL and HIF-2α is required for HIF-2α to be degraded in an oxygen dependent manner. Fairly straight forward. However, unlike pVHL complex, which is readily purified from bacteria, the ODD domain of HIF-2α is difficult to work with owing to some extreme biochemical properties. The ODD of HIF-2α is quite acidic and proper production of the protein is difficult in non-native contexts (i.e. in bacteria, yeast, or other model organisms that can be hijacked for protein production). As a result, we decided to work with the HIF-2α peptide, knowing that the gap between what we observe and what we conclude would increase. However, the peptide that we chose to study is relevant for study as 90% of HIF-2α mutations, found in literature to cause disease, are actually localized to this stretch.
So far, I have noted that we made conclusions regarding HIF-2α mutations that cause disease but instead discussed only how we studied how HIF-2α and pVHL interact. Well, following the successful crystallization of pVHL complex with HIF-2α and the determination of the structure of said interacting proteins, we made a prediction that mutations causing cancer would more severely abrogate binding of HIF-2α to pVHL than mutations that cause polycythemia. This prediction was made on account of our observation that mutations causing cancer were localized to stretches of HIF-2α that make critical contacts with pVHL while those causing polycythemia instead affected a region of HIF-2α that weakly engaged with pVHL. Although our prediction was based on an experimental design that merely approximated the nature of the HIF-2α-pVHL interaction, it was testable. The latter half of our research article focused on studying the affinity of HIF-2α mutants for pVHL. We measured how much pVHL remained bound to HIF-2α peptide following a long incubation period and stringent washing. We measured how quickly pVHL complex binds to HIF-2α that we immobilized on the end of a probe based on the rate of the increase in thickness at the end of said probe (this powerful technique is known as biolayer interferometry and provides a quantitative read-out of the affinity of two proteins for one another). Our various techniques pointed to the same conclusion; HIF-2α mutations that cause cancer do indeed weaken the interaction with pVHL more so than mutations that cause polycythemia. However, although this was a general trend, there were a few exceptions. For these exceptions, which were cancer-causing HIF-2α mutations that did not greatly affect affinity for pVHL, we showed that they greatly prevented interaction with another important regulatory protein, PHD2. Based on this collection of results, we concluded that cancer-causing HIF-2α mutations cause a greater increase in HIF-2α activity than polycythemia-causing mutations. As you might have noticed, this statement is broader than the one reported in our title, which suggests that analysis of how HIF-2α and pVHL interact is sufficient to understand the emergent genotype-phenotype relationships. In reality, we had to also explore how HIF-2α mutations affect binding to other proteins.
Other caveats remain. First, we did not actually measure HIF-2α activity. Rather, as a proxy, we measured the interaction between HIF-2α and key negative regulators with the idea that HIF-2α activity and interaction with pVHL/PHD2 is inversely correlated. Further, we did not conduct a comprehensive study of the dozens of different HIF-2α mutations, rather choosing to study 6 cancer-causing mutations and 4 polycythemia-causing mutations. Thirdly, our interaction studies largely utilized the same HIF-2α peptide that we used for our crystallography study. Do these caveats invalidate our model? No. A model requires a degree of support but can never be infallible. At such a point, it is no longer a model but dogma. A study with a larger number of HIF-2α mutations or a study that focuses on HIF-2α activity rather than interaction strength with negative regulators may reveal something different about HIF-2α mutations associated with various disease states. For the time being, our proposed model stands. And quite quickly, opportunities for testing our model have appeared. In a recent publication, a large number of HIF-2α mutations in individuals with polycythemia were reported. One of these mutations has not been previously reported in the clinical literature but we predicted in our manuscript that this mutation would disrupt binding to pVHL sufficiently to actually cause cancer. It is quite morbid but if this patient develops a neuroendocrine tumour our model would be further supported. As a quick note, to this point I have discussed cancer and polycythemia as two mutually exclusive clinical features. However, this is not entirely true. Many patients with cancer develop polycythemia congenitally as an additional phenotype. On the basis of our prediction regarding this newly reported mutation, we would recommend that the patient be monitored closely for tumour formation and this opinion has been communicated to the relevant individuals.
I hope I have communicated that scientific study necessitates simplification. What is encountered in the clinic or in the human body or in the wild are complicated. For us humans, it is difficult to parse actionable knowledge from such complexity. We instead reduce the complexity to manageable systems - a simmering sauce, excessive water wisping away. Although the result is palatable it conceals the nuance of what is studied. So it always a useful tool, as a researcher or a reader, to identify the gaps. However, hopefully it has come across that not all gaps are negatives. Gaps can be powerful. By crystallizing a protein you drive a wedge between the native conformation of the protein and that which it assumes in the crystalline state but the resultant atomic resolution is not possible any other way (for now). By utilizing peptides, interaction between proteins that cannot be purified (for now) can be studied. However, all gaps come with their baggage.
Post-script on Crystallography
For those interested in learning about biomolecular crystallography, a useful introductory text is Crystallography Made Crystal Clear, written by Gale Rhodes. The introductory chapter focuses on the gaps between what is observed during a crystallographic experiment and what is often concluded, which served as inspiration for this post.
[i] Tarade, D., Robinson, C. M., Lee, J. E., & Ohh, M. (2018). HIF-2α-pVHL complex reveals broad genotype-phenotype correlations in HIF-2α-driven disease. Nature communications, 9.