Glossary

Bar graph is a graphical representation of the frequency distribution or relative frequency distribution when dealing with qualitative data.

Binomial distribution function is a probability distribution representing a dichotomous binary variable.

Box plot is a visual representation of several basic descriptive statistics in a concise manner.

Categorical variable is another name for a qualitative variable.

Center of gravity of the data is the same as the expected value, or mean.

Central Limit Theorem states that in repeated random samples from a population, the sample mean will have a distribution function approximated by normal distribution, the expected value of the sample mean is equal to the true value of the population mean, and the variance of the sample mean is equal to population variance divided by the sample size.

Ceteris paribus is Latin for “other things being equal.”

Chi-square represents the distribution function of a variance.

Claim is a testable hypothesis.

Coefficient of variation is the ratio of the standard deviation to the mean.

Confidence interval provides a probabilistic estimate of a population parameter with a desired level of confidence.

Consistent means the sample variance becomes smaller as sample size increases.

Continuous dichotomous variables exist when one can place an order on the type of data.

Continuous variable random is a variable that can assume any real value. It represents all the values over a range.

Correction factor with the variance is used when the sample size is small or the sample is more than 5% of the population.

Cross sectional analysis is a study of a snapshot of regions at a given time.

Cumulative frequencies consist of sum of frequencies up to the value or class of interest.

Deductive statistics start from general information to make inferences about specifics.

Degree of freedom is the number of elements that can be chosen freely in a sample.

Dependent variable is the variable of interest that is explained by statistical analysis. Other names such as endogenous variable, Y-variable, response variable, or even output are often used as well.

Descriptive statistics provide a descriptive, instead of analytic view, of variables.

Dichotomous variables, also called dummy variables in econometrics, exist when there are only two nominal types of data.

Discrete dichotomous variable is a dichotomous variable that can take integer values.

Discrete random variables consist of integers only.

Dot plot represents frequencies as stacked dots. It is useful when only one set of data is under consideration.

Dummy variable is a qualitative variable used as an independent variable.

Econometrics is the application of statistics to economics.

Efficient refers to the estimator with the smallest variance compared to the other estimator(s).

Error is the difference between an observed value and its expected value. Error is the portion of variation that cannot be explained.

Errors in measurement refer to incorrectly measuring or recording the values of dependent or independent variables.

Expected value is the theoretical value of parameter. It is the same as the arithmetic mean.

Experimental design is a type of statistics where the experiment is controlled for different variables to ensure desired levels of confidence for the estimates of the variable.

F statistics is used to test complex hypothesis. It consists of the ratios of two variance measures.

Frequency distribution shows the frequency of occurrence for non-overlapping classes.

Grouped data are summarized or organized to provide a better and more compact picture of reality.

Harmonic mean is the average of rates. It is the reciprocal of the arithmetic mean of the reciprocal of the values.

Histogram is a graphical representation of the frequency distribution or relative frequency distribution when dealing with quantitative data.

Independent variable is a variable that is used to explain the response or dependent variable. Other names such as exogenous variable, X-variable, regressor, input, factor, or predictor variable are also used.

Individual error is the difference between an observed value and its expected value.

Inductive statistics observes specifics to make inference about the general population.

Inferential statistics is the methodology that allows making decision based on the outcome of a statistics from a sample.

An interval scale includes relative distances of any two sequential values, such as a Fahrenheit scale.

Kurtosis is a measure of pointedness or flatness of a symmetric distribution.

A Likert scale is a kind of ordinal scale, where the subjects provide the ranking of each variable.

The lower hinge is the 25th percentile of a box plot.

Mean is the arithmetic average. It represents the center of gravity of data.

Mean absolute error (MAE) is the average of the absolute values of individual errors.

Mean squared error is the same as variance.

Measurement scales are types of variables.

Measures of association determine the association between two variables or the degree of association between two variables. They consist of covariance and correlation coefficient.

Measures of central tendency provide concise meaningful summaries of central properties of a population.

Measures of dispersion reflect how data are scattered. The most important dispersion measures are variance and standard deviation.

Median is a value that divides observations into two equal groups. It is the midpoint among a group of numbers ranked in order.

Mode is the most frequent value of a population.

Nominal or categorical data are the “count” of the number of times an event occurs.

Normal distribution is a very common distribution function that reflects many randomly occurring events in life.

Null hypothesis reflects the status quo or how things have been or are currently.

Observed significant level is another name for the p value, which is the probability of seeing what you saw.

Ogive is a graph for cumulative frequencies.

Ordinal scale indicates that data is ordered in some way but the numbering has no value.

P value represents the probability of type I error for inference about a coefficient.

A parameter is a characteristic of a population that is of interest; it is constant and usually unknown.

A percentile is the demarcation value below which the stated percentage of the population or sample lie.

A pie chart is a graphical presentation of frequency distribution and relative frequency.

Point estimate is statistics that consists of a single value, such as mean or variance.

Probability is the likelihood that something will happen, expressed in the form of a ratio or a percentage.

Probability distribution determines the probability of the outcomes of a random variable.

Probability distribution for a continuous random variable is called a probability density function.

Probability distribution for a discrete random variable is called a discrete probability distribution and is represented as f (x).

Qualitative variables are non-numeric and represent a label for a category of similar items.

Quantitative variables are numerical and countable values.

Quartiles divide the population into four equal portions, each equal to 25% of the population.

Random variables are selected in a random fashion and by chance.

A ratio scale provides meaningful use of the ratio of measurements.

Real numbers consist of all rational and irrational numbers.

Relative frequency shows the percentage of each class to the total population or sample.

Relative variability is the comparison of variability using coefficient of variation.

Reliability of a sample mean (Inline.wmf) is equal to the probability that the deviation of the sample mean, from the population mean, is within the tolerable level of error (E ).

Root mean squared error is the square root of the mean square error and is the same as the standard error.

Sample standard deviation is the average error of the sample. This is the standard deviation obtained from a sample and is not the same as standard error.

Sample statistics are random values obtained from a sample. They estimate the corresponding population parameters and are used to make inferences about them.

Sample variance is an estimate of the population variance. It is the sum of the squares of the deviations of values from the sample mean divided by the degrees of freedom.

Sampling is a subset of population that is collected in a variety of ways.

Sampling distribution of any statistics explains how the statistics differ from one sample to another.

Scatter plot is a graph customarily used in presenting data from a regression analysis model.

Simple hypothesis gives an exact value for the unknown parameter of the assumed distribution function.

Skewness refers to the extent that a distribution function deviates from symmetric distribution.

Standard deviation is the square root of variance and represents the average error of a population or sample.

Standard error is the standard deviation of the estimated sample statistics.

Standardization is the conversion of the value of an observation into its Z score.

A statistic is a numerical value calculated from a sample that is variable and known.

Statistical hypothesis is an assertion about distribution of one or more random variables.

Statistical inference is the process of drawing conclusions based on evidence obtained from a sample. All statistical inferences are probabilistic.

Stem-and-leaf is a graphical way of summarizing information and is a type of a descriptive statistics.

Stochastic means that a model is probabilistic in nature and would result in varying results reflecting the random nature of the model.

t distribution is a distribution function that is designed to handle statistics from small samples correctly.

A testable hypothesis is a claim about a relationship among two or more variables.

Time series analysis is the analysis of time series data.

Tolerable level of error is the amount of error that the researcher is willing to accept.

Tolerance level is a measure for detecting multicollinearity. It is the reciprocal of Variance Inflation Factor (VIF). A tolerance value less than 0.1 is an indicative of the presence of multicollinearity.

Total sum of square (TSS) represents the total variation in the dependent variable.

Trimmed mean is a modification of the mean, where outliers are discarded.

Type I error is rejecting the null hypothesis even though it is true.

Type II error is failure to reject a false hypothesis.

Type III error is rejecting a null hypothesis in favor of an alternative hypothesis with the wrong sign.

Typical refers to the average.

Unbiased refers to an estimate whose expected value is equal to the corresponding population parameter.

The upper hinge is the 75th percentile of a box plot.

Validity is the lack of measurement error.

Variance is the sum of the squares of the deviations of values from their mean, divided by population size. It is the average of the squared individual errors.

Weighted mean is similar to the mean except the weights for observation are not equal and represent their contribution to the total. Calculation of GPA is an example of weighted mean.

Z score is a statistics based on mean and standard deviation. It is used to standardize unrelated variables for the purpose of comparing them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset