Index

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

List of Figures

Index

[SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X]

SYMBOL

` (backtick) : (colon), 2^nd [[]] (double square braces), 2^nd [] (square braces), 2^nd @ (at symbol), 2^nd & vectorized logic operator # (hash symbol) %in% operation + operator <- assignment operator <<- assignment operator = assignment operator == vectorized logic operator -> assignment operator ->> assignment operator | vectorized logic operator $ (dollar sign)

A

absolute error academic presentations accuracyMeasures() function adaptive learning add command, 2^nd additive process adjusted R-squared AdWords aesthetics anonymous functions Apgar test Apriori apriori() function arcsinh area under the curve. See AUC. arules package as.formula() function assignment operators at symbol ( @ ), 2^nd AUC (area under the curve) defined scoring categorical variables by audience for presentations average silhouette width averaging to reduce variance

B

backtick ( ` ) backups and version control bagging classifiers and overview, 2^nd bag-of-k-grams model bag-of-words model bar charts checking distributions for single variable checking relationships between two variables base error rate baskets batch model Bayesian inference Bayesian information criterion. See BIC. Bayesian methods Bayesian posterior estimate beta regression betas, defined between sum of squares. See BSS. bias model problems variance decomposition BIC (Bayesian information criterion) big data tools bimodal distribution binomial classification binwidth parameter blame command block declaration format for knitr bookstore example boosting technique bounded predictions branches vs. commits (Git) BSS (between sum of squares) business rules buzz dataset overview product names in

C

c() command cache knitr option Calinski-Harabasz index cluster analysis kmeansruns() function call-by-value semantics, 2^nd CART (classification and regression trees) casual variables categorization accuracy single-variable models variables CDC 2010 natality public-use data file central limit theorem centroid change history for Git characterization checkout command checkpoint documentation chi-squared test chooseCRANmirror() command churn, defined city block distance. See Manhattan distance. class() command classification and regression trees. See CART. classifiers and bagging client role clusterboot() function assessing clusters k-means algorithm clustering defined models clusters as classifications or scores distance comparisons overview coefficients defined for linear regression overview table of for logistic regression interpreting values overview table of negative collinearity, 2^nd colon ( : ), 2^nd comments commit command, 2^nd comparing files with Git Comprehensive R Archive Network. See CRAN. computer science machine learning conditional entropy confidence intervals confidence parameter contingency table continuous variables coord_flip command correlation cos() function cosine similarity distances kernels mathematical definition Cover’s theorem coverage, defined CRAN (Comprehensive R Archive Network) installing online resources credible intervals Cromwell’s rule cross-language linkage cross-validation estimating overfitting effects using performing using function cumulative distribution function cut() function, 2^nd cutree() function Cygwin

D

data architect data collection data cuts data dictionary Data directory data frame defined overview dbinom() function decision trees classification methods data cuts for problem-to-method mapping training variance and workings of declarative language definitional kernels dendrogram density estimation density plots dependent variables, 2^nd, 3^rd Derived directory deviance probability models residuals, logistic regression diff command, 2^nd difference parameter dim() command discrete variables dissimilarity dissolved clusters dist() function distances clustering models cosine similarity Euclidean distance Hamming distance Manhattan distance distribution function distribution shape distribution tail bound dlnorm() function dnorm() function document classification dollar sign ( $ ) domain knowledge dot plot dot product mathematical definition similarity using kernel double-precision floating-point numbers Dremel Drill dropping records for missing values dynamic language

E

echo knitr option end users, presentations for overview, 2^nd showing model usage summarizing goals workflow and model enrichment rate ensemble learning entropy equal sign ( = ) Euclidean distance eval knitr option exchangeability Executive Summary slide experimental design, statistics attempt to correct explanatory variables explicit kernels defined mathematical definition transforms linear regression example using export, deployment by Extensible Markup Language. See XML.

F

F1 faceting graph factor defined making sure levels are consistent overview summary command factor variable factor() command false positive rate. See FPR. faulty sensor filled bar chart Fisher scoring iterations fitdistr() function floating-point numbers for loops forecasting vs. prediction formats, data files fpc package FPR (false positive rate), 2^nd frequentist inference frequentist significance test F-statistic full normal form database functional language

G

gam package gam() function, 2^nd, 3^rd gap statistic Gaussian distributions, 2^nd Gaussian kernels defined example using mathematical definition gbm package gdata package generalization error, 2^nd, 3^rd generalized additive models. See GAMs. generalized linear models generic language geom layers ggplot2 glm() function beta regression logistic regression separation and separation and quasi-separation two-category classification weights argument glmnet package goal defining for project in presentations for end users for project sponsor Greenplum grouped data grouping records .gz extension

H

H2 database defined driver for overview Hadoop, 2^nd hair clusters Hamming distance hash symbol ( # ) hash, file hclust() function HDF5 (Hierarchical Data Format 5) held-out data help() command, 2^nd, 3^rd, 4^th heteroscedastic errors heteroscedastic, defined hexbin plots hierarchical clustering defined with hclust() function Hierarchical Data Format 5. See HDF5. histogram checking distributions for single variable defined Hive hold-out set homoscedastic errors homoscedastic, defined household grouping HTML (Hypertext Markup Language) HTTP service, R-based HTTPS (Hypertext Transfer Protocol Secure) hyperellipsoid Hypertext Markup Language. See HTML. Hypertext Transfer Protocol Secure. See HTTPS. hypothesis testing

I

Impala importance() function in keyword independent variables, 2^nd, 3^rd indicator variables defined overview init command inner product input variables inspect() function interaction terms interestMeasure() function invalid values itemset

J

J language Jaccard coefficient Java JavaScript Object Notation. See JSON. JDBC (Java Database Connectivity) join statement, 2^nd joint probability of the evidence Julia language

K

kernel, machine learning definition kernlab library k-fold cross-validation k-nearest neighbor. See KNN. KNN (k-nearest neighbor). See also nearest neighbor methods. Knowledge Discovery and Data Mining. See KDD.

L

L1/L2 distance languages, alternative Laplace smoothing lazy evaluation leaf node least squares method less-than symbol (< ) levels lhs() function library() function lift concept line plots linear relationships linear transformation kernels defined mathematical definition linearly inseparable data list label operators lists loess function log command, 2^nd, 3^rd log transformations log, Git logarithmic scale density plot when to use logit log-odds lowess function

M

Mahout maintenance Manhattan distance margin, defined Markdown best cases for using knitr example masking variable MASS package master branch matrices max command maxnodes parameter mean command mean value, and lognormal population median command Mercer’s theorem, 2^nd message knitr option mgcv package milestones documenting knitr min command mining, restricting items for mirrors, CRAN MongoDB motivation for project multicategory classification multiline commands multimodal distribution multinomial classification multiplicative process MySQL Mythical Man-Month

N

NA data type Naive Bayes classification methods document classification and multiple-variable models Naive Bayes assumption problem-to-method mapping smoothing naming knitr blocks narrow data ranges NB (nota bene) notes negative coefficients negative correlation newborn baby weight example nonlinear relationships non-monotone relationships defined extracting nonlinear relationships logistic regression using one-dimensional regression example overview, 2^nd predicting newborn baby weight nonsignificance normal probability function normalization organizing data for analysis overview standard deviation and normalized form nota bene notes. See NB notes. null classifiers NULL data type null deviance null hypothesis number sequences numeric accuracy, 2^nd

O

object-oriented language odds, defined OLTP (online transaction processing) online transaction processing. See OLTP. operations role operators, assignment organizing data for analysis origin repository outcome variables outliers out-of-bag samples overfitting common model problems estimating effects of using cross-validation pseudo R-squared and random forests

P

package system. See CRAN. pbeta() function pbinom() function Pearson coefficient performance permutation test phi() function, 2^nd, 3^rd Pig pipe-separated values, 2^nd pivot table plnorm() function plot() function PMML (Predictive Model Markup Language) point estimate Poisson distribution polynomial kernels defined mathematical definition posterior estimate PostgreSQL prcomp() function Predictive Model Markup Language. See PMML. Presto primalizing print() function prior distribution probability distribution function procedural language production environment promise-based argument evaluation pseudo R-squared defined logistic regression p-value and pull command, 2^nd PUMS American Community Survey data push command, 2^nd Python

Q

qbinom() function qlnorm() function qnorm() function quantile() function, 2^nd, 3^rd quasi-separation

R

R in Action, 2^nd radial kernels defined example using mathematical definition RAND command random sample, reproducing randomForest() function, 2^nd, 3^rd randomization randomly missing values ranking defined models R-based HTTP service rbinom() function, 2^nd read.table() function gzip compression structured data read.transactions() function rebasing, 2^nd, 3^rd receiver operating characteristic curve. See ROC curve. reference level defined SCHL coefficient regression defined, 2^nd problem-to-method mapping technical definition. See also linear regression; logistic regression. relational databases. See databases. relationships data science tasks visually checking bar charts hexbin plots line plots scatter plots remote repository for Git replicate() function reproducing results documentation random sample rescaling reshaping data residual standard error residuals defined deviance, logistic regression predictions on graph response variables Results directory results knitr option rlnorm() function rm() function rnorm() function ROC (receiver operating characteristic) curve root mean square error. See RMSE. root node rpart() command RSQLite package RStudio IDE, 2^nd rug, defined runif function running documentation

S

S language sample function saturated model scale() function scaling scatter plot SCHL coefficient scientific honesty Screwdriver tool Scripts directory select statement sensitivity separable data separation, logistic regression sequences of numbers shape of distribution shasum program sigmoid function signed logarithm sign-off by project sponsor sin() function size() function slots smoothing curves soft margin optimization soundness of model spam, identifying Spambase dataset applying SVM comparing results SVMs specificity splines SQL Screwdriver sqldf package, 2^nd square braces, 2^nd SQuirreL SQL, 2^nd, 3^rd Stack Overflow stacked bar chart standard deviation star workflow stat layers statistical learning statistical test power, 2^nd status command Storm structured values subsets sufficient statistic summary() function summary() function checking data for errors data ranges invalid values missing values outliers overview units linear regression coefficients table original model call producing quality statistics residuals summary logistic regression AIC coefficients table deviance residuals Fisher scoring iterations glm() function null deviance producing pseudo R-squared quasi-separation residual deviance separation overview support vector machines. See SVMs. support vectors defined overview SVMs (support vector machines) classification methods defined overview, 2^nd problem-to-method mapping Spambase example applying SVM comparing results overview spiral example good kernel overview wrong kernel support vectors synchronizing with Git synthetic variables system() function systematically missing values

T

table() command tag command targetRate parameter technical debt terminology, and model quality test set theta angle tidy knitr option time series analysis TODO notes total sum of squares. See TSS. total WSS (within sum of squares) TPR (true positive rate) training error transforming data trial and error true negative rate true outcome true positive rate. See TPR. TSS (total sum of squares) two-by-two confusion matrix two-category classification

U

UCI car dataset uncommitted changes unexplainable variance ungrouped data uniform resource locator. See URL. unimodal distribution units checking data using summary command cluster analysis unsupervised learning upselling URL (uniform resource locator)

V

variance variance command varImpPlot() function vectorized operations vectorized, defined vectors venue shopping views, in R

W

waste clusters workflow of end user, and model

X

XLS/XLSX files XML (Extensible Markup Language)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.