Index
A
- A/B testing
- abstraction, Scala
- actions / Actions
- actions engine
- Activity Monitor
- Actor model
- actors
- adaptive modeling / Model categorization
- aggregate functions
- aggregation operations
- aggregations
- Akka.io
- Akka documentation / What we have not talked about
- Akka framework
- Akka library / Futures example – stock price fetcher
- Algebird
- algebraic libraries
- Alternating least squares (ALS)
- alternative preprocessing techniques
- Amazon Web Services (AWS)
- Analysis of Variance (ANOVA)
- AngularJS
- annotation
- annual dividend yield
- ANother Tool for Language Recognition (ANTLR)
- Apache Commons Math
- Apache Parquet
- Apache Spark
- Apache Spark (Akka)
- APIs
- application
- applications
- architecture, Spark
- about / Understanding Spark architecture
- task scheduling / Task scheduling
- Spark components / Spark components
- MQTT / MQTT, ZeroMQ, Flume, and Kafka
- ZeroMQ / MQTT, ZeroMQ, Flume, and Kafka
- Flume / MQTT, ZeroMQ, Flume, and Kafka
- Kafka / MQTT, ZeroMQ, Flume, and Kafka
- HDFS / HDFS, Cassandra, S3, and Tachyon
- Cassandra / HDFS, Cassandra, S3, and Tachyon
- S3 / HDFS, Cassandra, S3, and Tachyon
- Tachyon / HDFS, Cassandra, S3, and Tachyon
- Mesos / Mesos, YARN, and Standalone
- YARN / Mesos, YARN, and Standalone
- Standalone / Mesos, YARN, and Standalone
- arrays
- Arrays / A whirlwind tour of JSON
- artificial neural networks
- Aster Data
- authentication
- autonomous systems / The problem
- Autoregressive Integrated Moving Average (ARIMA) / Alternative preprocessing techniques
- Autoregressive Moving Average (ARMA) / Alternative preprocessing techniques
- Avro
- AvroParquet
- Azkaban
B
C
- C-Epsilon SVM formulation / The nonseparable case – the soft margin
- cake pattern / Step 3 – instantiation
- Casbah
- Casbah query DSL
- case classes
- cash per share
- Cassandra
- categorical field
- categories, NP problems
- central limit theorem (CLT)
- centroid / K-means clustering
- Cholesky decomposition
- Cholesky factorization
- chromosomes / Evolutionary computing
- chunking
- class constructor template
- classification metrics
- classification model, evaluation factors
- classification model, terminology
- class prior
- class prior probability
- Client
- client-server applications
- client-side program
- clique
- Cloudera
- cluster assignment, K-means clustering
- cluster configuration, K-means clustering
- clustering
- clustering algorithms
- co-vector
- code snippets
- collision / Transformers
- command and control (C2)
- common discriminative kernels
- companion objects
- complex adaptive systems / Introduction to LCS
- complex queries / Complex queries
- complex types
- components, XCS
- computational workflow
- conditional dependency / Training
- conditional independence / A model by any other name
- conditional random field (CRF)
- configurability
- configuration options
- configuration parameters, SVM
- confusion matrix / F-score for multinomial classification
- conjugate directions
- conjugate gradient
- connected components
- Connection class
- connectionism
- consistent sampling
- constructive tuning strategy / Regularization
- consumer price index (CPI)
- Consumer Price Index (CPI)
- context bound / Coding against type classes
- continuation-passing style (CPS) / Beyond actors – reactive programming
- continuous space
- control learning / A solution – Q-learning
- convolution neural networks
- core parking
- correlation engine
- correlations
- Counter class
- covariant functor
- cross-validation
- cross-validation, model
- crossover operator, genetic algorithm implementation
- curve fitting
- custom supervisor strategies / Custom supervisor strategies
- custom type serialization
D
- Darwinian process / The origin
- data, profiling
- data-driven system
- data access layer
- data analysis life cycle / Linear models
- data analytics
- database metadata
- data chunks / 0xdata Sparkling Water
- data clustering
- data elements / 0xdata Sparkling Water
- data extraction
- DataFrame
- data frames / 0xdata Sparkling Water
- DataFrames
- DataFrameStatFunctions
- data ingest
- data mapper pattern
- data partitioning
- data rearranging
- data science
- data segmentation
- dataset
- data shuffling
- DataSourceConfig class
- data sources
- data transformation layer
- data types
- data types, Breeze
- about / Basic Breeze data types
- vectors / Vectors
- matrices / Matrices
- vectors, building / Building vectors and matrices
- matrices, building / Building vectors and matrices
- indexing / Advanced indexing and slicing
- slicing / Advanced indexing and slicing
- vectors, mutating / Mutating vectors and matrices
- matrices, mutating / Mutating vectors and matrices
- matrix multiplication / Matrix multiplication, transposition, and the orientation of vectors
- matrix transposition / Matrix multiplication, transposition, and the orientation of vectors
- vectors, orientation / Matrix multiplication, transposition, and the orientation of vectors
- data preprocessing / Data preprocessing and feature engineering
- feature engineering / Data preprocessing and feature engineering
- function optimization / Breeze – function optimization
- numerical derivatives / Numerical derivatives
- regularization / Regularization
- DBpedia / Basics of information retrieval
- decision-making agent / Concepts
- decision boundary / Plotting data
- decision tree
- decision tree, parameters
- decoding, hidden Markov model (HMM)
- def
- DenseVector or DenseMatrix
- dependency injection
- deployment modes, Spark
- descriptive models / Model categorization
- descriptive statistics
- designing
- design principles, Spark
- design template, for classifiers
- destructive tuning strategy / Regularization
- DFT-based filtering
- dimension reduction
- directed acyclic graph (DAG) / Lifting the hood
- Directed Acyclic Graph (DAG)
- directed graphical models
- Dirichlet distribution
- discrete Fourier transform (DFT) / PCA
- discrete Kalman filter
- discretization / Value encoding
- distributed algorithms
- dividend coverage ratio
- DMatrix class
- DNA / Evolutionary computing
- documents
- Domain Specific Languages (DSL)
- drivers
- Drools
- Dropwizard
- Druid
- dynamic programming
- dynamic routing
E
- e-mails
- earnings per share (EPS)
- edge list
- edges
- Eigenvalue decomposition
- Elastic Net
- element-wise operators
- Emacs / SBT
- encapsulation
- encoding scheme, genetic encoding
- ensemble learning methods
- enumerations
- epoch / The training epoch
- Erlang programming language / The Actor model
- error backpropagation, training epoch
- error handling, monadic data transformation
- error insensitive zone
- estimators
- evaluation
- evaluation, hidden Markov model (HMM)
- event bus / The event bus
- evidence
- evolution
- example data
- exchange-traded funds (ETFs) / Test case
- execution contexts
- ExecutionContextTaskSupport
- expectation-maximization (EM)
- expectation-maximization algorithm
- Expectation Maximization (EM) algorithm
- experimenting, with Spark
- exploration-exploitation trade-off
- exponential moving average
- exponential normalization / Softmax
- extended Kalman filter (EKF) / Benefits and drawbacks
- Extended Kalman Filters (EKF) / The discrete Kalman filter
- extended learning classifier systems
- extract, transform, and load (ETL)
- extraction
F
- -fold cross validation / K-fold cross validation
- F-score for binomial classification
- F-score for multinomial classification
- FACTORIE toolkit
- Fast Fourier Transform (FFT)
- feature construction
- features extraction
- features maps / Sharing of weights
- features selection
- Federal Election Commission (FEC)
- Federal Election Commission (FEC) data
- Federal Fund rate
- Federal fund rate (FDF)
- feed-forward neural network (FFNN) / The biological background
- feed-forward neural networks
- FFNN without a hidden layer / The multilayer perceptron
- finances 101
- first order predicate logic
- fitness functions, genetic algorithms
- fixed lag smoothing / Fixed lag smoothing
- Flex
- floating point format
- Flume
- follower network crawler / Follower network crawler, Fault tolerance
- fork-join pool
- ForkJoinTaskSupport
- Fourier analysis
- Fourier transform
- frameworks
- frequency domain
- fully connected neural network / The network topology
- functional approach
- function approximation / Quantization
- function optimization / Breeze – function optimization
- functors
- fundamental analysis
- futures
- futures, Akka framework
G
H
I
- Ignite File System (IGFS)
- Impala
- implementation, genetic algorithms
- implementation, Q-learning
- indexing / Advanced indexing and slicing
- influence diagrams
- information retrieval and text mining
- input forward propagation, training epoch
- insensitive error
- interactivity
- invokers
- Iris dataset
J
- Jacobian matrix
- Java
- java.sql.Types package
- Java Management Extensions (JMX)
- Java Mission Control (JMC)
- JavaScipt dependencies
- Java Specification Request (JSR) / Linear models
- JBlas/Linpack
- JDBC
- JFreeChart
- JFreeChart documentation
- JFreeChart library
- joda-time library
- JSON
- JSON4S types / JSON4S types
- JSON files
- JSON format
- JSON in Scala
- JSON package
- JSON support
- JSR110
- JSR 223
- Jython
K
- k-fold cross-validation / Cross-validation and model selection
- K-fold cross-validation scheme / Assessing a model
- k-means clustering
- K-means clustering
- Kafka
- Kalman smoothing
- Kamon
- Kelly Criterion
- kernel functions
- kernel trick
- key components, genetic algorithm implementation
- keyquality metrics
- Kryo
- Kudu
- Kullback-Leibler (KL) distance
L
M
- machine learning
- Machine Learning (ML)
- machine learning algorithms
- Machine Learning course
- machine learning engine
- machine learning problems
- maintainability
- map optimization
- maps
- Markov Chain Decision Process
- Markov decision processes
- master-workers, Akka
- mathematical abstractions
- mathematical concepts
- mathematical notation / Mathematical notation for the curious
- matrices
- maximum margin classifiers
- mean squared error (MSE) / One-variate linear regression
- measurement noise covariance / The measurement equation
- Mesos
- message
- message-passing mechanisms
- message sender
- metaphor for graphical models / Probabilistic graphical models
- methodology
- metrics
- Michigan approach / Why LCS?
- micro-batch processing
- mirrors
- mixins
- mixins, composing for building workflow
- MLlib / Breeze – function optimization
- MLlib algorithms
- ML libraries
- model
- model, assessing
- Model-View-Controller (MVC)
- model categorization
- modeling
- model monitoring
- modular JavaScript
- monadic composition
- monadic data transformation
- monads
- MongoDB
- Monitor class
- monitoring
- Monthly Active Users (MAU)
- morphism / Error handling
- moving averages
- MQTT
- MTable instances
- multiclass problems
- multilayer perceptron
- Multilayer Perceptron Classifier (MLCP)
- multinomial Naïve Bayes model
- Multivariate Analysis of Variance (MANOVA)
- Multivariate Bernoulli classification
- multivariate regression
- MurmurHash function
- mutation operator, genetic algorithm implementation
- Mutual Information (MI) / Spam filtering
N
- .NET MyMediaLite library
- n-grams / Basics of information retrieval
- NameNode
- Namenode UI
- natural language processing (NLP) / The feature functions model
- Naïve Bayes
- Naïve Bayes algorithm
- Naïve Bayes classifiers
- Naïve Bayes classifiers implementation
- Naïve Bayes models
- nested data
- net profit margin
- net sales
- network components, multilayer perceptron
- NodeJS
- Node Manager
- nodes
- non-linear models, dimension reduction
- nonlinear least squares minimization
- nonlinear SVM
- NP problems
- Nu-SVM / The nonseparable case – the soft margin
- numerical optimization
- NumericColumnExtensionMethods class
- numeric field
- NVD3
O
- object-oriented approach
- object-oriented design patterns
- objects
- Objects / A whirlwind tour of JSON
- observation
- one-class SVC
- one-variate linear regression
- online training / Online training versus batch training
- Online Transaction Processing (OLTP)
- Oozie
- operating income
- operating profit margin
- operations
- optimal substructures
- optimization
- optimization techniques
- OptionModel class / The OptionModel class
- OptionProperty class / The OptionProperty class
- options trading
- option trading, with Q-learning
- Ordering
- ordinary least squares regression
- outputs, linear models
- overfitting
- overlapping substructures
- overload operators
P
- package.scala source file
- padding / Value encoding
- PageRank algorithm
- PaintScale.scala source file
- parallel collections
- parallel collections, Scala
- Parallel Colt
- parallel execution
- parameters, SparkR glm implementation
- Paretto chart
- Parquet
- parquet file
- Parquet files
- parsers
- Partial Least Square Regression (PLSR) / Evaluation
- partially connected neural networks / The network topology
- pattern matchin
- Pattern matching
- pattern matching
- pay-out ratio
- Pearson correlation coefficient
- penalized least squares regression / Ln roughness penalty
- perceptron
- performance considerations
- performance evaluation, Spark
- permanence spectrum / Programming in data science
- persistence level
- Pimp my Library pattern
- pimp my library pattern
- pimp my library pattern
- pipeline
- pipeline API
- Pittsburgh approach / Why LCS?
- Play
- Play framework / Futures example – stock price fetcher
- plots
- Poisson distribution
- Pool
- Porter Stemmer
- POS (part-of-speech) tagging
- posterior probability
- Power Iteration Clustering (PIC)
- Predicted Residual Error Sum of Squares (PRESS) / Evaluation
- predictive model
- predictive models / Model categorization
- PreparedStatement API documentation
- PreparedStatement class
- price/book value ratio (PB)
- price/earnings ratio (PE)
- price/sales ratio (PS)
- price patterns
- Price to Earnings/Growth (PEG)
- primal problem / The nonseparable case – the soft margin
- Principal Component Analysis (PCA)
- principal components analysis, dimension reduction
- probabilistic graphical models
- probabilistic kernels
- probabilistic reasoning
- probabilistic structures
- problem dimensionality
- process monitoring
- Project Gutenberg
- projections
- propositional logic
- protein sequence annotation
- Protobuf
- pseudo-regret
- PySpark / PySpark
- Python
- Python, calling from Java/Scala
Q
R
- R
- read-evaluate-print-loop (REPL)
- real-world Bayesian network
- Receiver Operating Characteristic (ROC)
- receiver operating characteristic (ROC) curve / Evaluation
- recombination
- reconstruction/error minimization, K-means clustering
- recursive algorithm, discrete Kalman filter
- regression
- regression model / Design
- regression trees
- regression weights
- regularization / Regularization
- reinforcement learning
- reinforcement learning agent
- Remote Procedure Call (RPC)
- reproducible kernel Hilbert spaces
- request
- RequireJS
- residuals mean square (RMS) / Step 5 – minimizing the sum of square errors
- resilient applications
- resilient distributed dataset (RDD) / Apache Spark
- Resilient Distributed Dataset (RDD)
- Resilient Distributed Datasets (RDD)
- Resilient distributed datasets (RDD)
- Resource Manager
- response
- response views / Response views
- Rest APIs
- results
- ResultSet interface
- ridge regression
- Riemann metric
- risk handling
- ROC
- routing
- Rsclient/Rserve
- RStudio
- Rsync
- Run-Length Encoding (RLE)
S
- S3
- SBT
- sbteclipse project
- Scala
- Scala, integrating with Python
- Scala, integrating with R
- Scala API
- scalability
- scalability, with Actors
- Scalable frameworks
- Scala constructs
- Scala plugin for Eclipse
- Scala plugin for IntelljIDEA
- Scala programming
- scalastyle plugin
- Scala Swing
- Scalate template
- Scalatra
- Scalaz
- scatter plot matrix plots
- scatter plots
- schema
- Secondary Namenode
- segmentation
- semantic URLs / Dynamic routing, References
- semi-supervised learning
- sequences
- Sequential Minimal Optimization (SMO) / The nonseparable case – the soft margin
- sequential trials
- serialization
- serialization formats
- sessionization
- short interest
- short interest ratio
- shrinkage
- shuffling / Data shuffling and partitions
- Simple Build Tool (SBT)
- simple build tool (sbt) / Deploying Spark
- simple moving average
- simple workflow
- writing / Writing a simple workflow
- problem, scoping / Step 1 – scoping the problem
- data loading / Step 2 – loading data
- data, preprocessing / Step 3 – preprocessing the data
- immutable normalization / Immutable normalization
- patterns, discovering / Step 4 – discovering patterns
- data, analyzing / Analyzing data
- data, plotting / Plotting data
- classifier, implementing / Step 5 – implementing the classifier
- optimizer, selecting / Selecting an optimizer
- model, training / Training the model
- observations, classifying / Classifying observations
- model, evaluating / Step 6 – evaluating the model
- single page applications
- singular value decomposition / Ordinary least squares regression
- Singular Value Decomposition (SVD)
- singular value decomposition (SVD) / PCA
- slicing / Advanced indexing and slicing
- Slick
- smoothing factor for counters
- smoothing kernels
- soft margin / The nonseparable case – the soft margin
- source code
- spam filtering
- Spark
- Spark, applications
- SPARK-3703
- Spark applications
- Spark ecosystem
- Sparkling Water
- Spark Master
- Spark Notebook
- Spark notebooks
- Spark Notebooks
- SparkR
- Spark RDDs
- Spark SQL
- spectral density estimation
- SQL statements
- stackable trait injection / Composing mixins to build a workflow
- stand-alone programs
- Standalone
- standalone programs
- Stanford NLP toolkit
- stateful actors / Stateful actors
- state space estimation, discrete Kalman filter
- steepest descent
- stemming / Basics of information retrieval
- stimuli / The biological background
- stochastic gradient descent / Ordinary least squares regression
- Stochastic Gradient Descent (SGD)
- Stochastic Gradient Descent (SGD) algorithm
- stratified sampling
- streaming k-means
- StreamSets
- StringColumnExtensionMethods class
- strongly connected components
- structs
- substructures
- sum of squared errors (SSE) / One-variate linear regression
- supervised learning
- supervised machine learning algorithms
- support vector machines (SVMs)
- SVC
- SVD++
- SVM
- SVM dual problem
- SVMLight
- SVMWithSGD
- SVR
- Syslog
- system monitoring
T
- Tachyon
- tagging model / Basics of information retrieval
- task scheduling
- TaskSupport
- taxonomy, machine learning algorithms
- technical analysis
- technical analysis, terminology
- temporal difference
- Term Frequency Inverse Document Frequency (TF-IDF)
- terminology, LCS
- terminology, reinforcement learning
- test case, evaluation
- test case, trading strategy
- testing, Naïve Bayes
- tests, genetic algorithms
- text analysis pipeline
- text analytics, conditional random field (CRF)
- text mining
- text mining methodology
- ThreadPoolTaskSupport
- Thrift
- time series, in Scala
- tokenization
- tokens
- tools
- trading signal / Trading signals and strategy
- trading strategies
- training, hidden Markov model (HMM)
- training, Naïve Bayes classifiers implementation
- training and classification, multilayer perceptron
- training epoch, multilayer perceptron
- training workflow, logistic regression
- traits
- transformations
- transformers
- trending / Test case 1 – trending
- triangle counting algorithm
- triangle inequality
- try/catch statements
- Try type
- tuning memory usage
- Turkey paradox
- two-step lag smoothing algorithm / Experimentation
- type classes
- Typesafe Activator
- Typesafe activators
U
V
W
- web-jars
- web APIs
- web application
- web frameworks
- web services
- weighted graph
- weighted moving average
- word2vec
- WordNet / Basics of information retrieval
- workflow computational model
X
Y
Z
- zero-frequency problem
- ZeroMQ
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.