We investigate high-dimensional non-convex penalized regression where the true number of

We investigate high-dimensional non-convex penalized regression where the true number of covariates may grow at an exponential rate. parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general Bexarotene (LGD1069) class of non-convex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined Bexarotene (LGD1069) with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis. greatly exceeds the sample size × matrix Bexarotene (LGD1069) of covariates = (is the vector of unknown regression coefficients ∥ · ∥ denotes the > 0. Many commonly used variable selection procedures in the literature can be cast into the above framework including the best subset selection to grow with at the rate = = ? whenever the convexity of the least squares loss function does not dominate the concavity of the penalty part. In general the occurrence of multiple minima is unavoidable unless strong assumptions Bexarotene (LGD1069) are imposed on both the design matrix and the penalty function. The recent theory for SCAD penalized linear regression (Kim et al. 2008 and for general non-concave penalized generalized linear models (Fan and Lv 2011 indicates that one of the local minima enjoys the oracle property but it is still an unsolved problem how to identify the oracle estimator among multiple minima when ? (but < grows at an exponential rate. The recent independent work of Zhang (2010 2012 devised a multi-stage convex relaxation scheme and proved that for the capped is a random sample from the linear regression model: X is the × non-stochastic design matrix with the is the vector of unknown true parameters and = (is a vector of independent and identically distributed random errors. We are interested in the case where = greatly exceeds the sample size be Vwf the index set of covariates with non-zero coefficients and let denote the cardinality of to denote the minimal absolute value of the non-zero coefficients. Without loss of generality we may assume that the first components of ? is the least squares estimator fitted using only the covariates whose indices are in ∈ [0 +∞) with a continuous derivative on (0 +∞). To induce sparsity of the penalized estimator it is generally necessary for the penalty function to have a singularity at the origin i.e. > 2 where the notation > 0). Fan and Li (2001) recommended to use = 3.7 from a Bayesian perspective. On the other hand the MCP is defined by for some > 0 (as ↓ 1 it amounts to hard-thresholding thus in the following we assume > 1). Let x(for all ? 1 2 … denotes the × ∣: ≠ 0 denotes Bexarotene (LGD1069) the ∣∣to represent the size-∣with indices in and other related quantities are all allowed to depend on where ?> 0 and let be the tight convex upper bound defined in (2.7). The calibrated algorithm consists of the following two steps. Let > 0 will be discussed later. 2 Let as for each of the two steps a convex minimization problem is solved. In step 1 a smaller tuning parameter is adopted to increase the estimation accuracy see Section 3.1 for discussions on the practical choice of in order to identify the oracle estimator. The performance of a penalized regression estimator is known to heavily depend on the choice of the tuning parameter. To further calibrate non-convex penalized regression we consider the following high-dimensional BIC criterion (HBIC) to compare the estimators from the above solution path: is the model identified by denotes the cardinality of with greatly exceeds is a sequence of numbers that diverges to ∞ which will be discussed later. We compare the value of the above HBIC criterion for ∈ Λ= {: ∣> represents a rough estimate of an upper bound of the sparsity of the model and is allowed to diverge to ∞. We select the tuning parameter are i.i.d. mean zero sub-Gaussian random variables.