Details of Research Outputs

TitleA stochastic semismooth Newton method for nonsmooth nonconvex optimization
Author (Name in English or Pinyin)
Milzarek, A.1; Xiao, X.2; Cen, S.3; Wen, Z.4; Ulbrich, M.5
Date Issued2019
Source PublicationSIAM JOURNAL ON OPTIMIZATION
ISSN1052-6234
DOI10.1137/18M1181249
Firstlevel Discipline信息科学与系统科学
Education discipline科技类
Published range国外学术期刊
Volume Issue Pages卷: 29 期: 4 页: 2916-2948
References
[1] N. Agarwal, B. Bullins, and E. Hazan, Second-order stochastic optimization for machine learning in linear time, J. Mach. Learn. Res., 18 (2017), pp. 1-40, http://jmlr.org/papers/ v18/16-491.html.
[2] C. D. Aliprantis and K. C. Border, Infinite Dimensional Analysis, 3rd ed., Springer, Berlin, 2006.
[3] Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, in Proceedings of the 49th Annual ACM SIGACT Symposium on the Theory of Computing, 2017, pp. 1200-1205.
[4] Z. Allen-Zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, in Proc. Mach. Learn. Res. 70, PMLR Press, 2017, pp. 89-97.
[5] Z. Allen-Zhu and E. Hazan, Variance reduction for faster non-convex optimization, in Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 699-707.
[6] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, Optimization with sparsity-inducing penalties, Found. Trends Mach. Learn., 4 (2011), pp. 1-106, https://doi.org/10.1561/ 2200000015.
[7] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, CMS Books Math./Ouvrages Math. SMC, Springer, New York, 2011, https://doi.org/10.1007/978-1-4419-9467-7.
[8] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183-202, https://doi.org/10.1137/080716542.
[9] A. S. Berahas, R. Bollapragada, and J. Nocedal, An Investigation of Newton-Sketch and Subsampled Newton Methods, https://arxiv.org/abs/1705.06211, 2017.
[10] R. Bhattacharya and E. C. Waymire, A Basic Course in Probability Theory, 2nd ed., Universitext, Springer, Cham, 2016, https://doi.org/10.1007/978-3-319-47974-3.
[11] J. A. Blackard and D. J. Dean, Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables, Computers Electronics Agriculture, 24 (1999), pp. 131-151.
[12] R. Bollapragada, R. Byrd, and J. Nocedal, Exact and inexact subsampled Newton methods for optimization, IMA J. Numer. Anal., 39 (2019), pp. 545-578.
[13] A. Bordes, L. Bottou, and P. Gallinari, SGD-QN: Careful quasi-Newton stochastic gradient descent, J. Mach. Learn. Res., 10 (2009), pp. 1737-1754.
[14] L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), pp. 223-311, https://doi.org/10.1137/16M1080173.
[15] R. H. Byrd, G. M. Chin, W. Neveitt, and J. Nocedal, On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J. Optim., 21 (2011), pp. 977-995, https://doi.org/10.1137/10079923X.
[16] R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Math. Program., 134 (2012), pp. 127-155, https://doi.org/ 10.1007/s10107-012-0572-5.
[17] R. H. Byrd, S. L. Hansen, J. Nocedal, and Y. Singer, A stochastic quasi-Newton method for large-scale optimization, SIAM J. Optim., 26 (2016), pp. 1008-1031, https://doi.org/ 10.1137/140954362.
[18] F. H. Clarke, Optimization and Nonsmooth Analysis, 2nd ed., Classics in Appl. Math. 5, SIAM, Philadelphia, 1990, https://doi.org/10.1137/1.9781611971309.
[19] P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul., 4 (2005), pp. 1168-1200, https://doi.org/10.1137/050626090.
[20] C. D. Dang and G. Lan, Stochastic block mirror descent methods for nonsmooth and stochastic optimization, SIAM J. Optim., 25 (2015), pp. 856-881, https://doi.org/10.1137/130936361.
[21] A. Defazio, F. Bach, and S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2014, pp. 1646-1654.
[22] L. Deng and D. Yu, Deep learning: Methods and applications, Found. Trends Signal Process., 7 (2014), pp. 197-387, https://doi.org/10.1561/2000000039.
[23] J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), pp. 2121-2159.
[24] M. A. Erdogdu and A. Montanari, Convergence rates of sub-sampled Newton methods, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2015, pp. 3034-3042.
[25] F. Facchinei and J.-S. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems, Vol. II, Springer, New York, 2003.
[26] M. P. Friedlander and M. Schmidt, Hybrid deterministic-stochastic methods for data fitting, SIAM J. Sci. Comput., 34 (2012), pp. A1380-A1405, https://doi.org/10.1137/110830629.
[27] M. Fukushima and H. Mine, A generalized proximal point algorithm for certain nonconvex minimization problems, Internat. J. Systems Sci., 12 (1981), pp. 989-1000, https://doi. org/10.1080/00207728108963798.
[28] S. Ghadimi and G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim., 23 (2013), pp. 2341-2368, https://doi.org/10.1137/ 120880811.
[29] S. Ghadimi and G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program., 156 (2016), pp. 59-99, https://doi.org/10.1007/ s10107-015-0871-8.
[30] S. Ghadimi, G. Lan, and H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program., 155 (2016), pp. 267-305, https://doi.org/10.1007/s10107-014-0846-1.
[31] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins Stud. Math. Sci., Johns Hopkins University Press, Baltimore, MD, 2013.
[32] R. Gower, D. Goldfarb, and P. Richtarik, Stochastic block BFGS: Squeezing more curvature out of data, in Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 1869-1878.
[33] I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror, Result analysis of the NIPS 2003 feature selection challenge, in Advances in Neural Information Processing Systems, 17, MIT Press, MIT Press, Cambridge, MA, 2004, pp. 545-552, http://papers.nips.cc/paper/ 2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf.
[34] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer Ser. Statist., Springer, New York, https://doi.org/10.1007/978-0-387-84858-7, 2009.
[35] K. Jiang, D. Sun, and K.-C. Toh, A partial proximal point algorithm for nuclear norm regularized matrix least squares problems, Math. Program. Comput., 6 (2014), pp. 281-325, https://doi.org/10.1007/s12532-014-0069-8.
[36] R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2013, pp. 315-323.
[37] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436-444, https://doi.org/10.1038/nature14539.
[38] Y. LeCun, C. Cortes, and C. J. C. Burges, The MNIST Database of Handwritten Digits, http://yann.lecun.com/exdb/mnist (2010).
[39] J. D. Lee, Y. Sun, and M. A. Saunders, Proximal Newton-type methods for minimizing composite functions, SIAM J. Optim., 24 (2014), pp. 1420-1443, https://doi.org/10.1137/ 130921428.
[40] D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, RCV1: A new benchmark collection for text categorization research, J. Mach. Learn. Res., 5 (2004), pp. 361-397.
[41] H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, in Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, 2015, pp. 3384-3392.
[42] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding, in Proceedings of the 26th Annual ICML, New York, 2009, pp. 689-696, https://doi.org/ 10.1145/1553374.1553463.
[43] L. Mason, J. Baxter, P. Bartlett, and M. Frean, Boosting algorithms as gradient descent in function space, in Proceedings of NIPS'99, 1999, pp. 512-518, http://dl.acm.org/ citation.cfm?id=3009657.3009730.
[44] A. Milzarek, Numerical Methods and Second Order Theory for Nonsmooth Problems, Ph.D. dissertation, Technische Universität München, 2016.
[45] A. Milzarek and M. Ulbrich, A semismooth Newton method with multidimensional filter globalization for l1-optimization, SIAM J. Optim., 24 (2014), pp. 298-333, https://doi. org/10.1137/120892167.
[46] A. Milzarek, X. Xiao, S. Cen, Z. Wen, and M. Ulbrich, A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization, https://arxiv.org/abs/1803.03466, 2018.
[47] A. Mokhtari and A. Ribeiro, RES: regularized stochastic BFGS algorithm, IEEE Trans. Signal Process., 62 (2014), pp. 6089-6104, https://doi.org/10.1109/TSP.2014.2357775.
[48] J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Math. France, 93 (1965), pp. 273-299.
[49] P. Moritz, R. Nishihara, and M. Jordan, A linearly-convergent stochastic L-BFGS algorithm, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016, pp. 249-258.
[50] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2008), pp. 1574-1609, https: //doi.org/10.1137/070704277.
[51] N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim., 1 (2014), pp. 127-239, https://doi.org/10.1561/2400000003.
[52] P. Patrinos, L. Stella, and A. Bemporad, Forward-Backward Truncated Newton Methods for Convex Composite Optimization, https://arxiv.org/abs/1402.6655, 2014.
[53] M. Pilanci and M. J.Wainwright, Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence, SIAM J. Optim., 27 (2017), pp. 205-245, https://doi. org/10.1137/15M1021106.
[54] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Math. Oper. Res., 18 (1993), pp. 227-244, https://doi.org/10.1287/moor.18.1.227.
[55] L. Qi and J. Sun, A nonsmooth version of Newton's method, Math. Program., 58 (1993), pp. 353-367, https://doi.org/10.1007/BF01581275.
[56] S. J. Reddi, A. Hefny, S. Sra, B. Póczos, and A. J. Smola, Stochastic variance reduction for nonconvex optimization, in Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 314-323.
[57] S. J. Reddi, S. Sra, B. Póczos, and A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, in Advance in Neural Information Processing Systems 29, MIT Press, Cambridge, MA, 2016, pp. 1145-1153, http://papers.nips.cc/paper/pdf.
[58] H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Stat., 22 (1951), pp. 400-407.
[59] R. T. Rockafellar, Integral Functionals, Normal Integrands and Measurable Selections, Lecture Notes in Math. 543, Springer, New York, 1976, pp. 157-207.
[60] F. Roosta-Khorasani and M. W. Mahoney, Sub-sampled Newton methods, Math. Program., 174 (2019), pp. 293-326.
[61] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw., 61 (2015), pp. 85-117, https://doi.org/10.1016/j.neunet.2014.09.003.
[62] M. Schmidt, N. Le Roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Math. Program., 162 (2017), pp. 83-112, https://doi.org/10.1007/ s10107-016-1030-6.
[63] N. N. Schraudolph, J. Yu, and S. Günter, A stochastic quasi-Newton method for online convex optimization, in Proceedings of the 11th International Conference on Artificial Intelligence and Statistics, Vol. 2, 2007, pp. 436-443, http://proceedings.mlr.press/v2/ schraudolph07a.html.
[64] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, New York, 2014.
[65] Z. Shi and R. Liu, Large scale optimization with proximal stochastic Newton-type gradient descent, in Mach. Learn. Knowl. Disc. Databases 9284, Springer, New York, 2015, pp. 691-704, https://doi.org/10.1007/978-3-319-23528-8 43.
[66] L. Stella, A. Themelis, and P. Patrinos, Forward-backward quasi-Newton methods for nonsmooth optimization problems, Comput. Optim. Appl., 67 (2017), pp. 443-487, https: //doi.org/10.1007/s10589-017-9912-y.
[67] D. Sun and J. Sun, Semismooth matrix-valued functions, Math. Oper. Res., 27 (2002), pp. 150-169, https://doi.org/10.1287/moor.27.1.150.342.
[68] A. Themelis, M. Ahookhosh, and P. Patrinos, On the Acceleration of Forward-Backward Splitting via an Inexact Newton Method, https://arxiv.org/abs/1811.02935, 2018.
[69] R. Tomioka, T. Suzuki, and M. Sugiyama, Super-linear convergence of dual augmented Lagrangian algorithm for sparsity regularized estimation, J. Mach. Learn. Res., 12 (2011), pp. 1537-1586.
[70] P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), pp. 387-423, https://doi.org/10.1007/ s10107-007-0170-0.
[71] M. Ulbrich, Semismooth Newton methods for operator equations in function spaces, SIAM J. Optim., 13 (2002), pp. 805-842 (2003), https://doi.org/10.1137/S1052623400371569.
[72] J. Wang and T. Zhang, Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations, https://arxiv.org/abs/1706.07001, 2017.
[73] X. Wang, S. Ma, D. Goldfarb, and W. Liu, Stochastic Quasi-Newton methods for nonconvex stochastic optimization, SIAM J. Optim., 27 (2017), pp. 927-956, https://doi.org/10.1137/ 15M1053141.
[74] A Marketing Dataset, Causality Workbench Team, http://www.causality.inf.ethz.ch/data/ CINA.html (2008).
[75] A Pharmacology Dataset, Causality Workbench Team, http://www.causality.inf.ethz.ch/data/ SIDO.html (2008).
[76] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57 (2009), pp. 2479-2493, https://doi.org/ 10.1109/TSP.2009.2016892.
[77] L. Xiao and T. Zhang, A proximal stochastic gradient method with progressive variance reduction, SIAM J. Optim., 24 (2014), pp. 2057-2075, https://doi.org/10.1137/140961791.
[78] X. Xiao, Y. Li, Z. Wen, and L. Zhang, A regularized semi-smooth Newton method with projection steps for composite convex programs, J. Sci. Comput., 76 (2018), pp. 364-389, https://doi.org/10.1007/s10915-017-0624-3.
[79] P. Xu, F. Roosta-Khorasani, and M. W. Mahoney, Newton-Type Methods for Non- Convex Optimization Under Inexact Hessian Information, https://arxiv.org/abs/1708. 07164, 2017.
[80] P. Xu, F. Roosta-Khorasani, and M. W. Mahoney, Second-Order Optimization for Non- Convex Machine Learning: An Empirical Study, https://arxiv.org/abs/1708.07827, 2017.
[81] P. Xu, J. Yang, F. Roosta-Khorasani, C. Ré, and M. W. Mahoney, Sub-sampled Newton methods with non-uniform sampling, in Advances in Neural Information Processing Systems, 2016, pp. 3000-3008.
[82] Y. Xu and W. Yin, Block stochastic gradient iteration for convex and nonconvex optimization, SIAM J. Optim., 25 (2015), pp. 1686-1716, https://doi.org/10.1137/140983938.
[83] Z. Yao, P. Xu, F. Roosta-Khorasani, and M. W. Mahoney, Inexact Non-Convex Newton- Type Methods, https://arxiv.org/abs/1802.06925, 2017.
[84] H. Ye, L. Luo, and Z. Zhang, Approximate Newton methods and their local convergence, in Proc. Mach. Learn. Res. 70, PMLR Press, 2017, pp. 3931-3939, http://proceedings.mlr. press/v70/ye17a.html.
Citation statistics
Cited Times [WOS]:0   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
Identifierhttps://irepository.cuhk.edu.cn/handle/3EPUXD0A/1187
CollectionSchool of Data Science
Affiliation
1.Institute for Data and Decision Analytics (IDDA), Shenzhen Institute for Artificial Intelligence and Robotics for Society (AIRS), Chinese University of Hong Kong, Hong Kong
2.School of Mathematical Sciences, Dalian University of Technology, Dalian, China
3.School of Mathematical Sciences, Peking University, Beijing, China
4.Beijing International Center for Mathematical Research, Center for Data Science, National Engineering Laboratory for Big Data Analysis and Applications, Peking University, Beijing, 100871, China
5.Chair of Mathematical Optimization, Department of Mathematics, Technical University of Munich, Garching b. Munchen, 85748, Germany
First Author AffilicationInstitute for Data and Decision Analytics
Recommended Citation
GB/T 7714
Milzarek, A.,Xiao, X.,Cen, S.et al. A stochastic semismooth Newton method for nonsmooth nonconvex optimization[J]. SIAM JOURNAL ON OPTIMIZATION,2019.
APA Milzarek, A., Xiao, X., Cen, S., Wen, Z., & Ulbrich, M. (2019). A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM JOURNAL ON OPTIMIZATION.
MLA Milzarek, A.,et al."A stochastic semismooth Newton method for nonsmooth nonconvex optimization".SIAM JOURNAL ON OPTIMIZATION (2019).
Files in This Item:
There are no files associated with this item.
Related Services
Usage statistics
Google Scholar
Similar articles in Google Scholar
[Milzarek, A.]'s Articles
[Xiao, X.]'s Articles
[Cen, S.]'s Articles
Baidu academic
Similar articles in Baidu academic
[Milzarek, A.]'s Articles
[Xiao, X.]'s Articles
[Cen, S.]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Milzarek, A.]'s Articles
[Xiao, X.]'s Articles
[Cen, S.]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.