

Then we average the test error over all possible data points x We want to find f that minimizes this but we need P(y|x) which we don’t have access to.We quantify the test error as the expected error on the test set (in other words the average test error).Our problem is to learn a function f(x) for predicting the labels of test samples xi’ in Rdfor i=1.n’ also drawn i.i.d from P(x,y).Each xi is a d-dimensional vector (xi in Rd) and yi is +1 or -1.We are given n training samples (xi,yi) for i=1.n drawn i.i.d from a probability distribution P(x,y).Finally, we test our proposed algorithms on both synthetic and real world datasets and the experimental results confirm our theoretical analysis. We also show that the utility of some special non-convex loss functions can be reduced to a level ( i.e., depending only on log p) similar to that of convex loss functions. A somewhat surprising discovery is that although the two kinds of measurements are quite different, their induced utility upper bounds are asymptotically the same under some assumptions. We further demonstrate that the advantages of this result can be achieved by the measure of ℓ 2 norm of the projected gradient. We then investigate the problem in high dimensional space, and show that by measuring the utility with Frank-Wolfe gap, it is possible to bound the utility by the Gaussian Width of the constraint set, instead of the dimensionality p of the underlying space. Also, we extend the error bound measurement, for the first time, from empirical risk to population risk by using the expected ℓ 2 norm of the gradient. For DP-ERM with non-smooth regularizer, we generalize an existing work by measuring the utility using ℓ 2 norm of the projected gradient. We first consider the problem in low-dimensional space.

In this paper, we study the Differentially Private Empirical Risk Minimization (DP-ERM) problem with non-convex loss functions and give several upper bounds for the utility in different settings.
