Extrapolating missing antibody-virus measurements across serological studies.
The development of new vaccines, as well as our understanding of key processes that shape viral evolution and host antibody repertoires, relies on measuring multiple antibody responses against large panels of viruses. Given the enormous diversity of circulating virus strains and antibody responses, comprehensively testing all antibody-virus interactions is infeasible. Even within individual studies with limited panels, exhaustive testing is not always performed, and there is no common framework for combining information across studies with partially overlapping panels, especially when the assay type or host species differ. Prior studies have demonstrated that antibody-virus interactions can be characterized in a vastly simpler and lower dimensional space, suggesting that relatively few measurements could predict unmeasured antibody-virus interactions. Here, we apply matrix completion to several large-scale influenza and HIV-1 studies. We explore how prediction accuracy evolves as the number of measurements changes and approximates the number of additional measurements necessary in several highly incomplete datasets (suggesting ∼250,000 measurements could be saved). In addition, we show how the method can combine disparate datasets, even when the number of available measurements is below the theoretical limit that guarantees successful prediction. This approach can be readily generalized to other viruses or more broadly to other low-dimensional biological datasets.
|Year of Publication||
2022 Jul 04