Journal of Physical Chemistry B, Vol.123, No.2, 343-355, 2019
Maximum Caliber Can Build and Infer Models of Oscillation in a Three-Gene Feedback Network
Single-cell protein expression time trajectories provide rich temporal data quantifying cellular variability and its role in dictating fitness. However, theoretical models to analyze and fully extract information from these measurements remain limited for three reasons: (i) gene expression profiles are noisy, rendering models of averages inapplicable, (ii) experiments typically measure only a few protein species while leaving other molecular actors-necessary to build traditional bottom-up models-unnoticed, and (iii) measured data are in fluorescence, not particle number. We recently addressed these challenges in an alternate top-down approach using the principle of Maximum Caliber (MaxCal) to model genetic switches with one and two protein species. In the present work we address scalability and broader applicability of MaxCal by extending to a three-gene (A, B, C) feedback network that exhibits oscillation, commonly known as the repressilator. We test MaxCal's inferential power by using synthetic data of noisy protein number time traces-serving as a proxy for experimental data-generated from a known underlying model. We notice that the minimal MaxCal model-accounting for production, degradation, and only one type of symmetric coupling between all three species-reasonably infers several underlying features of the circuit such as the effective production rate, degradation rate, frequency of oscillation, and protein number distribution. Next, we build models of higher complexity including different levels of coupling between A, B, and C and rigorously assess their relative performance. While the minimal model (with four parameters) performs remarkably well, we note that the most complex model (with six parameters) allowing all possible forms of crosstalk between A, B, and C slightly improves prediction of rates, but avoids ad hoc assumption of all the other models. It is also the model of choice based on Bayesian information criteria. We further analyzed time trajectories in arbitrary fluorescence (using synthetic trajectories) to mimic realistic data. We conclude that even with a three-protein system including both fluorescence noise and intrinsic gene expression fluctuations, MaxCal can faithfully infer underlying details of the network, opening future directions to model other network motifs with many species.