Index

[SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X]

SYMBOL

` (backtick)
: (colon)2nd
[[]] (double square braces)2nd
[] (square braces)2nd
@ (at symbol)2nd
& vectorized logic operator
# (hash symbol)
%in% operation
+ operator
<- assignment operator
<<- assignment operator
= assignment operator
== vectorized logic operator
-> assignment operator
->> assignment operator
| vectorized logic operator
$ (dollar sign)

A


absolute error
academic presentations
accuracyMeasures() function
adaptive learning
add command2nd
additive process
adjusted R-squared
AdWords
aesthetics
anonymous functions
Apgar test
Apriori
apriori() function
arcsinh
area under the curve.
    See AUC.
arules package
as.formula() function
assignment operators
at symbol ( @ )2nd
AUC (area under the curve)
  defined
  scoring categorical variables by
audience for presentations
average silhouette width
averaging to reduce variance

B

backtick ( ` )
backups and version control
bagging
  classifiers and
  overview2nd
bag-of-k-grams model
bag-of-words model
bar charts
  checking distributions for single variable
  checking relationships between two variables
base error rate
baskets
batch model
Bayesian inference
Bayesian information criterion.
    See BIC.
Bayesian methods
Bayesian posterior estimate
beta regression
betas, defined
between sum of squares.
    See BSS.
bias
  model problems
  variance decomposition
BIC (Bayesian information criterion)
big data tools
bimodal distribution
binomial classification
binwidth parameter
blame command
block declaration format for knitr
bookstore example
boosting technique
bounded predictions
branches vs. commits (Git)
BSS (between sum of squares)
business rules
buzz dataset
  overview
  product names in

C

c() command
cache knitr option
Calinski-Harabasz index
  cluster analysis
  kmeansruns() function
call-by-value semantics2nd
CART (classification and regression trees)
casual variables
categorization
  accuracy
  single-variable models
  variables
CDC 2010 natality public-use data file
central limit theorem
centroid
change history for Git
characterization
checkout command
checkpoint documentation
chi-squared test
chooseCRANmirror() command
churn, defined
city block distance.
    See Manhattan distance.
class() command
classification and regression trees.
    See CART.
classifiers and bagging
client role
clusterboot() function
  assessing clusters
  k-means algorithm
clustering
  defined
  models
    clusters as classifications or scores
    distance comparisons
    overview
coefficients
  defined
  for linear regression
    overview
    table of
  for logistic regression
    interpreting values
    overview
    table of
  negative
collinearity2nd
colon ( : )2nd
comments
commit command2nd
comparing files with Git
Comprehensive R Archive Network.
    See CRAN.
computer science machine learning
conditional entropy
confidence intervals
confidence parameter
contingency table
continuous variables
coord_flip command
correlation
cos() function
cosine similarity
  distances
  kernels
  mathematical definition
Cover’s theorem
coverage, defined
CRAN (Comprehensive R Archive Network)
  installing
  online resources
credible intervals
Cromwell’s rule
cross-language linkage
cross-validation
  estimating overfitting effects using
  performing using function
cumulative distribution function
cut() function2nd
cutree() function
Cygwin

D

data architect
data collection
data cuts
data dictionary
Data directory
data frame
  defined
  overview
dbinom() function
decision trees
  classification methods
  data cuts for
  problem-to-method mapping
  training variance and
  workings of
declarative language
definitional kernels
dendrogram
density estimation
density plots
dependent variables2nd3rd
Derived directory
deviance
  probability models
  residuals, logistic regression
diff command2nd
difference parameter
dim() command
discrete variables
dissimilarity
dissolved clusters
dist() function
distances
  clustering models
  cosine similarity
  Euclidean distance
  Hamming distance
  Manhattan distance
distribution function
distribution shape
distribution tail bound
dlnorm() function
dnorm() function
document classification
dollar sign ( $ )
domain knowledge
dot plot
dot product
  mathematical definition
  similarity
  using kernel
double-precision floating-point numbers
Dremel
Drill
dropping records for missing values
dynamic language

E

echo knitr option
end users, presentations for
  overview2nd
  showing model usage
  summarizing goals
  workflow and model
enrichment rate
ensemble learning
entropy
equal sign ( = )
Euclidean distance
eval knitr option
exchangeability
Executive Summary slide
experimental design, statistics attempt to correct
explanatory variables
explicit kernels
  defined
  mathematical definition
  transforms
    linear regression example
    using
export, deployment by
Extensible Markup Language.
    See XML.

F

F1
faceting graph
factor
  defined
  making sure levels are consistent
  overview
  summary command
factor variable
factor() command
false positive rate.
    See FPR.
faulty sensor
filled bar chart
Fisher scoring iterations
fitdistr() function
floating-point numbers
for loops
forecasting vs. prediction
formats, data files
fpc package
FPR (false positive rate)2nd
frequentist inference
frequentist significance test
F-statistic
full normal form database
functional language

G

gam package
gam() function2nd3rd
gap statistic
Gaussian distributions2nd
Gaussian kernels
  defined
  example using
  mathematical definition
gbm package
gdata package
generalization error2nd3rd
generalized additive models.
    See GAMs.
generalized linear models
generic language
geom layers
ggplot2
glm() function
  beta regression
  logistic regression
  separation and
  separation and quasi-separation
  two-category classification
  weights argument
glmnet package
goal
  defining for project
  in presentations
    for end users
    for project sponsor
Greenplum
grouped data
grouping records
.gz extension

H

H2 database
  defined
  driver for
  overview
Hadoop2nd
hair clusters
Hamming distance
hash symbol ( # )
hash, file
hclust() function
HDF5 (Hierarchical Data Format 5)
held-out data
help() command2nd3rd4th
heteroscedastic errors
heteroscedastic, defined
hexbin plots
hierarchical clustering
  defined
  with hclust() function
Hierarchical Data Format 5.
    See HDF5.
histogram
  checking distributions for single variable
  defined
Hive
hold-out set
homoscedastic errors
homoscedastic, defined
household grouping
HTML (Hypertext Markup Language)
HTTP service, R-based
HTTPS (Hypertext Transfer Protocol Secure)
hyperellipsoid
Hypertext Markup Language.
    See HTML.
Hypertext Transfer Protocol Secure.
    See HTTPS.
hypothesis testing

I


Impala
importance() function
in keyword
independent variables2nd3rd
indicator variables
  defined
  overview
init command
inner product
input variables
inspect() function
interaction terms
interestMeasure() function
invalid values
itemset

J

J language
Jaccard coefficient
Java
JavaScript Object Notation.
    See JSON.
JDBC (Java Database Connectivity)
join statement2nd
joint probability of the evidence
Julia language

K


kernel, machine learning definition
kernlab library
k-fold cross-validation
k-nearest neighbor.
    See KNN.
KNN (k-nearest neighbor).
    See also nearest neighbor methods.
Knowledge Discovery and Data Mining.
    See KDD.

L

L1/L2 distance
languages, alternative
Laplace smoothing
lazy evaluation
leaf node
least squares method
less-than symbol (< )
levels
lhs() function
library() function
lift concept
line plots
linear relationships
linear transformation kernels
  defined
  mathematical definition
linearly inseparable data
list label operators
lists
loess function
log command2nd3rd
log transformations
log, Git
logarithmic scale
  density plot
  when to use
logit
log-odds
lowess function

M

Mahout
maintenance
Manhattan distance
margin, defined
Markdown
  best cases for using
  knitr example
masking variable
MASS package
master branch
matrices
max command
maxnodes parameter
mean command
mean value, and lognormal population
median command
Mercer’s theorem2nd
message knitr option
mgcv package
milestones
  documenting
  knitr
min command
mining, restricting items for
mirrors, CRAN
MongoDB
motivation for project
multicategory classification
multiline commands
multimodal distribution
multinomial classification
multiplicative process
MySQL
Mythical Man-Month

N

NA data type
Naive Bayes
  classification methods
  document classification and
  multiple-variable models
  Naive Bayes assumption
  problem-to-method mapping
  smoothing
naming knitr blocks
narrow data ranges
NB (nota bene) notes
negative coefficients
negative correlation
newborn baby weight example
nonlinear relationships
non-monotone relationships
  defined
  extracting nonlinear relationships
  logistic regression using
  one-dimensional regression example
  overview2nd
  predicting newborn baby weight
nonsignificance
normal probability function
normalization
  organizing data for analysis
  overview
  standard deviation and
normalized form
nota bene notes.
    See NB notes.
null classifiers
NULL data type
null deviance
null hypothesis
number sequences
numeric accuracy2nd

O

object-oriented language
odds, defined
OLTP (online transaction processing)
online transaction processing.
    See OLTP.
operations role
operators, assignment
organizing data for analysis
origin repository
outcome variables
outliers
out-of-bag samples
overfitting
  common model problems
  estimating effects of using cross-validation
  pseudo R-squared and
  random forests

P

package system.
    See CRAN.
pbeta() function
pbinom() function
Pearson coefficient
performance
permutation test
phi() function2nd3rd
Pig
pipe-separated values2nd
pivot table
plnorm() function
plot() function
PMML (Predictive Model Markup Language)
point estimate
Poisson distribution
polynomial kernels
  defined
  mathematical definition
posterior estimate
PostgreSQL
prcomp() function
Predictive Model Markup Language.
    See PMML.
Presto
primalizing
print() function
prior distribution
probability distribution function
procedural language
production environment
promise-based argument evaluation
pseudo R-squared
  defined
  logistic regression
  p-value and
pull command2nd
PUMS American Community Survey data
push command2nd
Python

Q

qbinom() function
qlnorm() function
qnorm() function
quantile() function2nd3rd
quasi-separation

R

R in Action2nd
radial kernels
  defined
  example using
  mathematical definition
RAND command
random sample, reproducing
randomForest() function2nd3rd
randomization
randomly missing values
ranking
  defined
  models
R-based HTTP service
rbinom() function2nd
read.table() function
  gzip compression
  structured data
read.transactions() function
rebasing2nd3rd
receiver operating characteristic curve.
    See ROC curve.
reference level
  defined
  SCHL coefficient
regression
  defined2nd
  problem-to-method mapping
  technical definition.
    See also linear regression; logistic regression.
relational databases.
    See databases.
relationships
  data science tasks
  visually checking
    bar charts
    hexbin plots
    line plots
    scatter plots
remote repository for Git
replicate() function
reproducing results
  documentation
  random sample
rescaling
reshaping data
residual standard error
residuals
  defined
  deviance, logistic regression
  predictions on graph
response variables
Results directory
results knitr option
rlnorm() function
rm() function
rnorm() function
ROC (receiver operating characteristic) curve
root mean square error.
    See RMSE.
root node
rpart() command
RSQLite package
RStudio IDE2nd
rug, defined
runif function
running documentation

S

S language
sample function
saturated model
scale() function
scaling
scatter plot
SCHL coefficient
scientific honesty
Screwdriver tool
Scripts directory
select statement
sensitivity
separable data
separation, logistic regression
sequences of numbers
shape of distribution
shasum program
sigmoid function
signed logarithm
sign-off by project sponsor
sin() function
size() function
slots
smoothing curves
soft margin optimization
soundness of model
spam, identifying
Spambase dataset
  applying SVM
  comparing results
  SVMs
specificity
splines
SQL Screwdriver
sqldf package2nd
square braces2nd
SQuirreL SQL2nd3rd
Stack Overflow
stacked bar chart
standard deviation
star workflow
stat layers
statistical learning
statistical test power2nd
status command
Storm
structured values
subsets
sufficient statistic
summary() function
summary() function
  checking data for errors
    data ranges
    invalid values
    missing values
    outliers
    overview
    units
  linear regression
    coefficients table
    original model call
    producing
    quality statistics
    residuals summary
  logistic regression
    AIC
    coefficients table
    deviance residuals
    Fisher scoring iterations
    glm() function
    null deviance
    producing
    pseudo R-squared
    quasi-separation
    residual deviance
    separation
  overview
support vector machines.
    See SVMs.
support vectors
  defined
  overview
SVMs (support vector machines)
  classification methods
  defined
  overview2nd
  problem-to-method mapping
  Spambase example
    applying SVM
    comparing results
    overview
  spiral example
    good kernel
    overview
    wrong kernel
  support vectors
synchronizing with Git
synthetic variables
system() function
systematically missing values

T

table() command
tag command
targetRate parameter
technical debt
terminology, and model quality
test set
theta angle
tidy knitr option
time series analysis
TODO notes
total sum of squares.
    See TSS.
total WSS (within sum of squares)
TPR (true positive rate)
training error
transforming data
trial and error
true negative rate
true outcome
true positive rate.
    See TPR.
TSS (total sum of squares)
two-by-two confusion matrix
two-category classification

U

UCI car dataset
uncommitted changes
unexplainable variance
ungrouped data
uniform resource locator.
    See URL.
unimodal distribution
units
  checking data using summary command
  cluster analysis
unsupervised learning
upselling
URL (uniform resource locator)

V


variance
variance command
varImpPlot() function
vectorized operations
vectorized, defined
vectors
venue shopping
views, in R

W

waste clusters
workflow of end user, and model

X

XLS/XLSX files
XML (Extensible Markup Language)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset