Dasmeh, Pouria and Andreas Wagner

More than a hundred proteins in yeast reversibly aggregate and phase-separate in response to various stressors, such as nutrient depletion and heat shock. We know little about the protein sequence and structural features behind this ability, which has not been characterized on a proteome-wide level. To identify the distinctive features of aggregation-prone protein regions, we apply machine learning algorithms to genomescale limited proteolysis-mass spectrometry (LiP-MS) data from yeast proteins. LiP-MS data reveals that 96 proteins show significant structural changes upon heat shock. We find that in these proteins the propensity to phase separate cannot be solely driven by disordered regions, because their aggregation-prone regions (APRs) are not significantly disordered. Instead, the phase separation of these proteins requires contributions from both disordered and structured regions. APRs are significantly enriched in aliphatic residues and depleted in positively charged amino acids. Aggregator proteins with longer APRs show a greater propensity to aggregate, a relationship that can be explained by equilibrium statistical thermodynamics. Altogether, our observations suggest that proteome-wide reversible protein aggregation is mediated by sequence encoded properties. We propose that aggregating proteins resemble supra-molecular amphiphiles, where APRs are the hydrophobic parts, and non-APRs are the hydrophilic parts. (c) 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).