I am a machine learning researcher at STR residing in Boston, Massachusetts. Previously, I was a data scientist at Evonik. Broadly, I am interested in the intersection of optimization, linear algebra, and deep learning. Specifically, I am interested in graph deep learning, first-order optimization methods, and minimax optimization.
I completed my PhD in mathematics at Rensselaer Polytechnic Institute where I was advised by Yangyang Xu and mentored by Jie Chen. My research focus was on providing complexity analysis for first-order optimization methods in decentralized computing environments.
My resume is available upon request.
BoFire: Bayesian Optimization Framework Intended for Real Experiments
Johannes P. Dürholt, et. al.
Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems
Gabriel Mancino-Ball and Yangyang Xu
Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization
Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, and Yangyang Xu
Proceedings of the 38th AAAI Conference on Artificial Intelligence, 2024
Proximal stochastic recursive momentum methods for nonconvex composite decentralized optimization
Gabriel Mancino-Ball, Shengnan Miao, Yangyang Xu, and Jie Chen
Proceedings of the 37th AAAI Conference on Artificial Intelligence, 2023
A decentralized primal-dual framework for non-convex smooth consensus optimization
Gabriel Mancino-Ball, Yangyang Xu, and Jie Chen
IEEE Transactions on Signal Processing, 2023
A PyTorch implementation of the FastGCN method
The goal of this project was to create a PyTorch implementation of the FastGCN method. The code base is designed for large-scale datasets (i.e. the OGB datasets) with new features such as mini-batch inference. All models were built from scratch to facilitate maximum learning.
Decentralized training of graph convolutional networks
The goal of this project was study the effect of (decentralized) distributed training of graph neural networks. Up to 32 GPUs were utilized to perform parallel gradient computations with local data while MPI was used to propagate updates throughout the GPUs. This project served as a foundation for future projects where training with multiple GPUs was required.