Modeling concentration and dispersion in multiple regression
MetadataShow full item record
- Discussion Papers 
We consider concepts and models that are useful for measuring how strongly the distribution of a positive response Y is concentrated near a value y0 > 0 with a focus on how concentration varies as a function of covariates. We combine ideas from statistics, economics and reliability theory. Lorenz introduced a device for measuring inequality in the distribution of incomes that indicate how much the incomes below the uth quantile fall short of the egalitarian situation where everyone has the same income. Gini introduced an index that is the average over u of the difference between the Lorenz curve and its values in the egalitarian case. More generally, we can think of the Lorenz and Gini concepts as measures of concentration that applies to other response variables in addition to incomes, e.g. wealth, sales, dividends, taxes, test scores, precipitation, and crop yield. In this paper we propose modified versions of the Lorenz and Gini measures of concentration that we relate to statistical concepts of dispersion. Moreover, we consider the situation where the measures of concentration/dispersion are functions of covariates. We consider the estimation of these functions for parametric models and a semiparametric model involving regression coefficients and an unknown baseline distribution. In this semiparametric model, which combines ideas from Pareto, Lehmann and Cox, we find partial likelihood estimates of the regression coefficients and the baseline distribution that can be used to construct estimates of the various measures of concentration/dispersion. Keywords: Spread, concentration, Lorenz curve, Gini index, Lehmann model, Cox regression, Pareto model.