4

Considering the Threat Environment

Chapter 2 considered threats to your data, while Chapter 3 considered threats to your application and model. This chapter considers the threats to your environment as a whole and divides environments into two parts: business and consumer. Business threats focus on the ability to earn money, serve clients, provide a useful infrastructure, and address business requirements (such as accounting and meeting legal needs). Consumer threats focus on communication between individuals, entertainment, buying products, and addressing personal needs (such as interacting with government entities or making appointments with your doctor).

(And, yes, you need to worry about the consumer element because your users will always incorporate consumer elements into the business environment.)

The previous chapters examined parts of the whole to make the threats easier to see and understand. This chapter is an introduction to the whole picture, of how things work together to create a particular kind of threat. You’ve seen the trees; now it’s time to see the forest. With these issues in mind, this chapter discusses the following topics:

  • Defining an environment
  • Understanding business threats
  • Considering social threats
  • Employing Machine Learning (ML) in security in the real world

Technical requirements

This chapter requires that you have access to either Google Colab or Jupyter Notebook to work with the example code. The Requirements to use this book section of Chapter 1, Defining Machine Learning Security, provides additional details on how to set up and configure your programming environment.

The Accessing GitHub using OAuth-type authentication section requires that you have a GitHub account, which you can create at https://github.com/join. When testing the code, use a test site, test data, and test APIs to avoid damaging production setups and to improve the reliability of the testing process.

Using the downloadable source code is always highly recommended. You can find the downloadable source on the Packt GitHub site at https://github.com/PacktPublishing/Machine-Learning-Security-Principles or my website at http://www.johnmuellerbooks.com/source-code/.

Defining an environment

An environment is the sum of the interaction an object has with the world—whether it’s an application running on a network, with the network or the internet as its environment, a robot running an assembly line, with the building housing the assembly line as its environment, or a human working in an office with the real world as an environment is immaterial. An environment defines the surroundings in which an entity operates and therefore interacts with other entities. Each environment is unique but contains common elements that make it possible to secure the environment. An ML environment includes the following elements, which are used as the basis for discussion as the chapter progresses:

  • Data of any type and from any source
  • An application model
  • Ancillary code, such as libraries
  • Interfaces to third-party code such as services
  • An Application Programming Interface (API)
  • Third-party applications that interact directly (such as applications that augment an organization’s offerings) or indirectly (such as the shopping site that users surreptitiously use during work hours) with the environment
  • Users (those who use the application, but don’t control it)
  • Managers (those who define organizational, environmental, or application policies)
  • Developers (those who create any application code, including data scientists, computer scientists, researchers, database administrators, and so on)
  • Security professionals (those who control application access)

Many of the tactics currently available to secure applications (including biometric and physical security) are equally applicable to any environment but you develop and interact with them in different ways. Any environment can benefit from authentication and filtering, but it’s hardly likely that you’ll find biometric authentication used to access a consumer product site, such as Amazon.com. On the other hand, a site devoted to governmental research will likely include several layers of authentication, including biometric authentication and guards at the door. This chapter doesn’t include a discussion of physical security in the form of locked down server rooms and guards at the door, but it does cover a considerable range of application-specific security types, such as implementing passwords, locking down resources, looking for odd data patterns, and removing potentially malicious data.

Understanding business threats

Business software solutions have become more complex over the years and so have the security threats facing them. Many businesses run a hybrid setup today where part of the business software resides locally on a network (some of which forms a private cloud-based configuration) and the other part is hosted online as one of the “as a service” options, such as Platform as a Service (PaaS). Consequently, security often comes in layers for businesses.

Traditional security is a starting point for the local part of the infrastructure and service-level security is part of the cloud-based component. The Cloud Adoption Statistics for 2021 article at https://hostingtribunal.com/blog/cloud-adoption-statistics/ is enlightening because it shows that, even if you consider only the cloud component of an organization, 69 percent rely on a hybrid solution for their cloud presence, and that some organizations leverage up to 5 different hosting solutions. It’s unlikely that your ML application will be able to rely on a single data source or reside on a single setup when your organization is large enough. Consequently, you need to be ready to work with security professionals to secure your application to keep the data safe. Oddly enough, communicating your needs to security professionals who are used to dealing with monolithic applications and microservices is difficult unless you speak their lingo.

Unfortunately, a starting point isn’t a solution. For example, when hosting your cloud-based solutions on Amazon Web Services (AWS), you have a choice of 26 different security-related services (as of the time of writing, you can be sure there will be more soon). Most of them are oriented toward protecting the cloud part of your software, so you still need other layers for the local part of your solution. Amazon does provide help for organizations using its services.

The security picture may seem overwhelming unless you begin to break it down into manageable pieces and review those pieces without necessarily looking at a particular solution until you need to get into the details of configuration and code writing. For example, it’s important to know at the outset that you need to encrypt your data (even the open source material because you don’t want anyone modifying it after you begin to use it); especially when that data is in a place where a hacker can reach it. However, you don’t necessarily need to think about using AWS encrypted Simple Storage Service (S3) buckets until you choose to implement a part of your solution on AWS (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-encryption.html if you’d like to see what’s involved). Becoming mired in details at the outset is problematic because you begin to see individual trees and miss the forest entirely.

Protecting consumer sites

The majority of attacks against businesses that host consumer sites are through websites or APIs. A website offers the flexibility a hacker needs to experiment and look for potential holes in coverage. It’s possible for a hacker to attack through an API, for example, but an API allows better tracking of the incoming request so that the hacker’s activities are easier to expose. However, an online store is unlikely to notice the hacker providing requests as little changes in an order through the website interface (as an example) while looking for an exploitable hole.

There are a considerable number of threats against consumer sites, many of which you can handle using traditional methods, but are better handled using ML solutions. For example, as shown in the Manipulating filtered data and Creating an email filter sections source code examples later, you can use ML applications to monitor the inflow of data for unusual data patterns or unwanted data. Backend ML applications can detect and remove bad data from the database using a technique similar to that shown in the Starting with a simple removal section. Figure 4.1 shows the most common attacks against ML applications through a website and provides some ideas on how to protect against them:

Figure 4.1 – Common threats against consumer sites

You can also find other kinds of attacks that aren’t ML-specific but could have an effect on ML activities by affecting the environment in which the application executes. These attacks include the following:

  • Session hijacking: This form of attack is also known as cookie poisoning. The hacker intercepts and modifies a cookie using a man-in-the-middle (MITM) attack to steal data, corrupt security measures, or both.
  • Scraping: An activity that some view as a legitimate process for obtaining the huge quantities of data needed for certain types of ML analysis while others see the damage that the process can cause. However, for many websites, scraping is a serious threat that steals private data, obtains information for membership inference attacks, performs reverse engineering of the site for the purpose of illegitimate replication, discovers application operations, and performs other malicious acts as discussed in Chapter 2 and Chapter 3. Consequently, you also see other articles that discuss the opposite side of the coin, such as Web scraping protection: How to protect your website against crawler and scraper bots at https://datadome.co/bot-management-protection/scraper-crawler-bots-how-to-protect-your-website-against-intensive-scraping/. The fact is that this kind of attack is more of a human nature issue and deals with the mistruth of perspective as described in the Defining the human element section in Chapter 1, Defining Machine Learning Security.
  • Carding: Hackers gain access to lists of credit or gift cards in some manner, usually on the dark web. They usually start by making small purchases with each card to determine whether the card is still active. Once the hacker knows the card is legitimate, it’s used to make a huge purchase of some type. Vendors are currently fighting back by using ML-based services to detect this kind of attack. Because this is such a huge topic, you will find it covered in detail in Chapter 8.

The point of all these threat sources is that consumer sites are security sieves. There are so many holes that it may seem impossible to plug them all, but with diligence, administrators of all stripes and developers can work together to plug most holes and detect intrusions left by others. The essential element in all these possible intrusions is to know what threats are currently in use and then guard against them using techniques such as those found in Figure 4.1.

Understanding malware

The term malware refers to any software installed on a host system that causes harm to the host or the client systems attached to it through any means, including stealing anything such as data or company secrets. The software might damage the systems physically, steal data, corrupt data, encrypt the data for ransom demands, or perform a very wide range of other malicious tasks. By damaging the system, the malware can also put people’s lives at risk, such as in a medical facility. Mind you, sometimes giants play this game, such as the cyberwar brewing between the US and Russia; see https://www.nytimes.com/2019/06/15/us/politics/trump-cyber-russia-grid.html and https://www.bloomberg.com/news/features/2022-01-26/what-happens-when-russian-hackers-cyberattack-the-u-s-electric-power-grid.

ML is adept at detecting, preventing, and fixing certain types of malware attacks, but you must write the application in a very flexible way. Fortunately, you can now find websites that can assist with both static and dynamic identification of various kinds of malware on local systems and, to a lesser extent, online sites, such as VirusTotal (https://support.virustotal.com/hc/en-us/categories/360000160117-About-us), so it may be less necessary to build low-level skills to disassemble and identify various malware executables and better to develop research skills to locate the information that others have found quickly. You can read more about this kind of attack in Chapter 7.

Understanding network attacks

Most people think about Denial of Service (DoS) or DDoS attacks when hearing about network attacks. In fact, you see DDoS listed in Figure 4.1, discussed later in this chapter, and explored fully in Chapter 5. However, network attacks are often subtle, as discussed in the Eyeing the small stuff section in this chapter. Hackers might only sniff data from your network in a manner that’s nearly impossible to detect (see http://blog.johnmuellerbooks.com/2011/06/07/sniffing-telnet-using-wireshark/ for one of thousands of exploits). In fact, network attacks are extremely prevalent and hit the big players too. The fact is that if your network isn’t safe, customers will hear about it (usually pasted in big letters at the beginning of an article about security), and your business will suffer.

Eyeing the small stuff

Before proceeding further with the big issues that most people think about, it’s important to look at the small things as well. It’s easy to miss the small stuff in an organization. For example, the Internet of Things (IoT) makes an appearance almost everywhere today in devices that most people take for granted. These devices just seem to work, so it’s easy to forget about them. However, these devices are connected to the internet and they are becoming more complex as people want them to do more. A thermostat may not seem like a big thing, but consider the fact that you can now find thermostats with an ML application controlling them, as described in Swiss Researchers Create Machine Learning Thermostat at https://www.rtinsights.com/empa-machine-learning/.

Of course, the first question is what a thermostat, even a really smart thermostat, has to do with ML application security. It turns out that some researchers have found a way to hack a thermostat, as described in the article #DefCon: Thermostat Control Hacked to Host Ransomware at https://www.infosecurity-magazine.com/news/defcon-thermostat-control-hacked/. Now, imagine what would happen if the thermostat somehow connected to someone’s network, perhaps for the purpose of recording statistics, you see that a thermostat really can be a security threat.

In the thermostat scenario, a hacker gains access to the thermometer, adjusts the software to emit corrupted data to the logs, and waits for someone to read the log data. Once on a host system, the corrupted data allows the hacker to gain access to the host system, perhaps using a Trojan. At this point, the hacker can use the access to perform identity theft or hold the system for ransom. To prevent such an attack, you must layer defenses, employing the following:

  • Standard forms of security such as passwords (see the Understanding the kinds of application security section in this chapter) on the thermostat and host system
  • Ensembles to detect errant data input streams (see the Using ensemble learning section in Chapter 3) as part of the input to the ML application
  • Trojan detection (see the Understanding trojan attacks section in Chapter 3) within the ML application

One of the biggest small holes in security today falls into the category of Supervisory Control and Data Acquisition (SCADA) systems. They run a great many things, including essentials such as pacemakers and water supply systems, not to mention electrical plants. The article Understanding the Relative Insecurity of SCADA Systems at http://blog.johnmuellerbooks.com/2011/11/28/understanding-the-relative-insecurity-of-scada-systems/ seems outdated, but unfortunately, no one has bothered to secure these small systems, as described in the article Biggest threats to ICS/SCADA systems at https://resources.infosecinstitute.com/topic/biggest-threats-to-ics-scada-systems/ (among many others). Your ML application may connect to these systems and these small holes are a real threat that you need to consider fixing.

Dealing with web APIs

You see a lot of coverage about APIs in this book because modern businesses can’t do without them. Chapter 6 provides a special focus on the use of anomalies to create holes in APIs that hackers use to gain particular kinds of access. In addition, you see them mentioned as part of Figure 4.1. However, this section discusses a prevention and mitigation strategy called confidential computing, which is the use of a specially configured CPU to keep data encrypted until actual processing.

Desktop systems have a Trusted Platform Module (TPM) that the operating system can use to make working with specially designed applications considerably more secure (see https://docs.microsoft.com/windows/security/information-protection/tpm/trusted-platform-module-overview as an example). The TPM makes this latest version of Windows considerably harder to infect with malware, as described in the Tom’s Guide article at https://www.tomsguide.com/news/what-is-a-tpm-and-heres-why-you-need-it-for-windows-11 (which is one of many reasons that this book spends less time on local systems and more time on the cloud, IoT, and networks). Fortunately, confidential computing doesn’t necessarily require a TPM because cloud providers also make it available as a service:

The Confidential Computing Consortium (https://confidentialcomputing.io/) was formed by companies such as Alibaba, Arm, Baidu, IBM, Intel, Google Cloud, Microsoft, and Red Hat to make data protection in the cloud easier. Of course, there is no free lunch. While your data is a lot more secure using confidential computing, the services cost you considerably more and you need to also think about the performance loss in using them.

Dealing with the hype cycle

When considering which security strategies to employ, you need to consider the hype cycle as described in Hype Cycle for Security Operations, 2020 at https://www.gartner.com/en/documents/3986721. What this article tells you is that security follows a cycle:

  • Innovation: Some event or new technology triggers new security strategies.
  • Inflated expectations: Everyone gets sold a technology that is never going to work as anticipated.
  • Disillusionment: People drop a perfectly good technology because it failed as a marketable item, rather than as a good technology. Being able to make a profit on technology within a specific timeframe is what keeps the technology alive through infusions of investment capital.
  • Enlightenment: The early adopters begin to experience realistic expectations for the technology.
  • Productivity: The technology is now in common use.

At least you now understand the security cycle to some extent. It’s used for every new innovation in technology, not just ML. Unfortunately, developing and fully implementing a new security strategy can take up to ten years, during which time hackers raise some real havoc if they have come up with a zero-day exploit. To ensure that your ML application remains secure, you need to invest in the security technologies and strategies that have made it beyond the disillusionment phase as a minimum.

It may seem as if you could keep business and social concerns separate, but people are social beings, so social threats will creep into your business-related security setting as well. The next section describes social threats from a business perspective. Yes, many of these issues also affect people’s personal lives, but the focus is on how these social threats will affect your business.

Considering social threats

Social threats affect individuals the most. A social threat is something that entices the user to perform a risky behavior or compromises the individual in some way. Here are a few ideas to consider:

  • Social media: A user is enticed to do things such as discuss company policies or strategies in the interest of being social.
  • Ads: Someone presents an ad that discusses some new swanky product, but the ad ends up compromising the individual in some way, such as providing access to a social media account, a shopping site, or even your local network. ML makes it possible to create convincing ads based on actual buyer shopping habits.
  • Utilities: A special tool allows the individual to do something interesting, such as changing the color of their Facebook site. You find utilities all over the place because people naturally want to fiddle with whatever it is that they think requires an update or change. A utility can plant a Trojan on the individual’s machine or grab the individual’s information.
  • Videos: Have you seen enough cat videos yet? Well, try this video of someone surfing on a shark in Australia instead! Individuals become unresponsive while watching videos, giving a hacker the opportunity to steal them blind without them even noticing.
  • Followers: Some interesting person needs more followers. ML makes it possible to infer who might be so interesting that the individual needs to follow them. Click the link and you can become one of those people who followed them into hacker heaven.
  • Terror: Deep learning makes it possible to create a fake of anything using any media. So, someone sends the individual a link with the individual running down the middle of the main street naked. A link is supplied so that the outraged individual can complain and the hacker gains access.
  • Social engineering: Hackers use ML to create a social engineering attack based on individual interests, associates, work environment, and the like. The hacker can pose as a salesperson, a colleague from another company, or whatever else it takes to gain the person’s trust.
  • Blackmail: Someone gains access to sensitive information that the individual thought was secure. The blackmail doesn’t ever end. The individual will continue giving up information, resources, or whatever else the blackmailer requires until there is nothing left.

No matter how the entry is gained, the idea that a hacker has compromised your personal data or that a website is stalking you is frightening. However, social threats affect businesses as well. People run businesses and when an employee encounters a new threat, the threat applies to the organization too. ML actually makes the hacker’s job a lot easier, if it wasn’t incredibly easy already. Chapters 7 through 9 discuss how ML has a big part to play in making social threats considerably more effective.

In some cases, a hacker will use social threats to gain information about an organization or individuals to gain a foothold in the organization itself. For example, profiling makes it possible for a hacker to perform social engineering attacks with greater ease. Tracking a user’s activity also provides useful information. If the hacker really wants to create problems, identity theft makes it possible for the hacker to pose as the user of the organization from a remote location. In short, social threats are just as important to the security environment as business threats are, but in a different way.

Spam

Spam is a major avenue of attack for most hackers (see https://www.howtogeek.com/412316/how-email-bombing-uses-spam-to-hide-an-attack/ for some examples). A hacker could use a spam attack to hide those error messages from your network, intrusion messages from an account, or just about anything else. Spam can also include subtleties such as capturing unintended clicks when a user clicks one after accidentally opening it. Even the best spam filters provide 99.9 percent coverage, which means that if you get 100 emails in a day, one piece of spam is likely to make it through each 10-day period (and most spam checkers just aren’t that good). Hackers constantly modify the approach used to create spam to keep any new techniques for detecting spam off balance. The Developing a simple spam filter example section of this chapter shows one ML method for detecting spam, but even it isn’t perfect. Consequently, it’s likely that users will encounter spam and that the spam will eventually provide a vector for social engineering, phishing (see https://www.kaspersky.com/resource-center/threats/spam-phishing), or other attacks against the user’s machine, the network, and your ML applications. Chapters 8 and 9 detail how you can effectively guard against the fraud and hacker aspects of spam.

Identity theft

Some people see identity theft as a user issue. It’s true that the user will spend a great deal of time and money overcoming the effects of identity theft. However, depending on how the identity theft is perpetrated, the effects could be more significant to your business than the loss of money from purchases that no one will pay for after the merchandise is delivered. Even though identity is normally associated with credit or other personal issues, it can also affect your ML application in the following ways:

  • The data in your database is corrupted when the identity thief poses as a legitimate user. It’s hard to tell the real user’s data from that of the identity thief.
  • A hacker gains entry to your network using a stolen identity.
  • Services are misdirected to the holder of the stolen identity, rather than the legitimate user.
  • Analysis of social or other identity-based statistics becomes impossible. For example, which person in which area of town do you use for a profile?
  • Top employees can lose security credentials or be compromised in other ways, causing harm to your business by making them unavailable for various tasks.

This is just the tip of the iceberg. Society has always depended to some degree on being able to positively identify individuals. However, that dependency is growing as more technology is added and a positive ID becomes essential.

Unwanted tracking

Many users don’t want businesses or other entities to know every detail of their lives, even if the business is legitimate. The popularity of articles such as Here’s how to avoid unwanted tracking online at https://www.techradar.com/news/avoiding-unwanted-tracking-online and 4 Ways to Protect Your Phone’s Data From Unwanted Tracking at https://preyproject.com/blog/en/4-ways-to-protect-phone-data-unwanted-tracking/ indicate that the desire for privacy is real. However, when a hacker begins tracking a person, things can get really interesting because, now, the loss of privacy affects more than just the user. For example, a hacker can use tracking to begin a social engineering attack or profile an organization for other kinds of attacks.

Remote storage data loss or corruption

Employees typically store some amount of business data on their local hard drive (assuming their device has one). If you have remote access to that hard drive, then you can move the data or at least back it up. However, if the employee stores data on a remote server to which you lack access, the data now becomes a problem. You can’t back the data up and a hacker could compromise the data, including any company secrets that the user left in plain sight. Even if the data isn’t compromised, the fact that the user has it in an undisclosed location means that any corruption will also go unnoticed, which can ultimately affect your ML application in a number of ways (see Chapter 2, Mitigating Risk at Training by Validating and Maintaining Datasets). The two best ways to mitigate this threat are through employee monitoring and training.

Account takeover

According to a number of online sources, users typically have 150 or more personal online accounts, each of which requires a password. However, users are unlikely to create a unique password for all of those accounts. For one thing, few users could memorize all of those passwords, and making each of those passwords strong is nearly impossible. While you may think that users would rely on a password manager (password wallet), the Password Manager survey results at https://www.passwordmanager.com/password-manager-trust-survey/ point out that 65 percent of users don’t trust them at all and that 48 percent won’t use one. Interestingly enough, only 10 percent of users see Multi-Factor Authentication (MFA) as a viable alternative to using a password manager. What many users do is create an acceptably strong password and then use the same password everywhere. Consequently, when a hacker takes over a user’s account, the hacker also gains insights into the user and possibly finds methods to discover the user’s entire list of passwords, including the password for your ML application.

One of the best ways to detect this sort of attack is through behavioral analysis, as described in Eliminating Account Takeovers with Machine Learning and Behavioural Analysis at https://www.brighttalk.com/webcast/17009/326415/eliminating-account-takeovers-with-machine-learning-and-behavioural-analysis (you need to sign up for the free account). However, behavioral analysis can be time-consuming and requires intimate knowledge of the user.

So far, you have discovered both business and social threats, gained some ideas on how to detect them, and obtained a few tips on either preventing or mitigating them. All this material assumes that you have a stable environment and that the hardening you perform on your systems and applications remains effective. Unfortunately, nothing is stable and hackers have a habit of overcoming obstacles. The next section discusses how to make your setup flexible enough to adapt.

Employing ML in security in the real world

The real world is ever-changing and quite messy. You may think that there is a straightforward simple solution to a problem, but it’s unlikely that the solution to any given security problem is either straightforward or simple. What you often end up with is a layering of solutions that match the requirements of your environment. Consequently, you might find that an ML application designed to detect threats is part of a solution, the flexible part that learns and makes a successful attack less likely. However, you likely need to rely on traditional security and service-based security as well. It’s also important to keep user training in mind and not neglect those small things.

The reality of ML is that it’s a tool like any other tool and not somehow a magic wand that will remove all of your security problems. If Chapter 3 shows you anything, it demonstrates that ML security exploits exist in great quantities and that users are often the worst enemies of ML-based solutions. However, with layering, it becomes possible to protect a network in a number of ways, including relying on ensembles to combine the best models for your particular environment. Two of the more common security-specific approaches to protecting a network are as follows:

  • Ensuring user authentication (the validation that the user’s identity is real) and authorization (giving the user the correct rights) go as planned
  • Filtering out potentially hazardous data before the user even gets to see it

There are a number of ways to perform either of these tasks, but this chapter focuses on simple ML examples. The reason you want to use an ML application to perform these tasks is that the application has the potential to learn about new threats without waiting for signature updates or reprogramming. As the ML application becomes more aware of the techniques that a hacker employs to get through, the use of reinforcement learning can augment the training originally provided to the neural network and keep the hacker at bay (at least for a while).

Understanding the kinds of application security

Banish any thought that there is just one type of security. Application security comes in many forms and each form works best in a particular scenario. Chapter 5 considers the issue of keeping your network clean, which means using some type of security, but security must extend to the environment as a whole. Security comes down to a matter of control, but the biggest problem is determining what sort of control to use. Useful control must consider both the needs of the individual and the requirements of the organization. Making security measures too onerous will make adherence to policies less likely. Security that isn’t robust enough leaves an organization open to attack. Consequently, you see mixes of security measures in the following forms in most environments:

  • Role-based: Depends on the role that a user is performing at any given time, so that the same user may have more privileges in some situations than in others. For example, the user may have more privileges when accessing a resource locally than when accessing the same resource off-site from a mobile application. This is a flexible form of security, but also the most confusing for users. It works well for critical resources that contain sensitive information.
  • Attribute-based: Used as an alternative to role-based security where the characteristics of a resource determine who can access it or what actions are acceptable. The focus is on the specifics of the resource, rather than on the role of the user.
  • Resource-based: Depends on the resource that the user wants to access, with consistent access in all situations. This form of security is useful for less critical resources that users may need to access continually, so consistency is more important than other considerations.
  • Group-based: Defines security measures based on the needs of a group, such as a workgroup or a department. Every individual in the group has the same access. This form of security is most useful for teams or people who perform the same task on common resources. It tends to reduce training costs. The criticality of the resource is dependent on the trust potential of the group as a whole.
  • Identity-based: Focuses on the needs of an individual to provide access to somewhat critical resources. Because it provides equal access to the resource at all times no matter what role the user performs at the time, this form of security could potentially lead to leaks.

Many ML applications currently lack any of these forms of security, making them wide open to attack by any user who can gain access to them. Locking down an application means taking the following steps:

  1. Requesting authorization
  2. Authenticating the individual
  3. Monitoring and logging their access
  4. Verifying that each action is allowed by the application security profile

Following these steps will help you begin the process of ensuring that your ML application remains safe. Keeping track of what is and isn’t effective is important because these steps will require augmentation depending on the particulars of your application.

Considering the realities of the machine

An ML application can’t think, isn’t creative, and has extremely limited flexibility. This fact is often brought home to anyone who tries to process textual data to prevent the use of derogatory terms or to prevent the resulting corpus from becoming unfair in some way. The terms in this section are offensive, but I used discretion to try to avoid even more derogatory terms found on the internet. Obviously, I didn’t want to include these terms in the book.

A modern form of derogatory comment takes the form of people’s names, such as calling someone a Karen, Stacy, Becky, Kyle, Troy, Chad, or any of a number of other names. If you’re interested, the definition at https://www.dictionary.com/e/slang/karen/ provides some insights into the use of the term Karen.

Obtaining useful results from ML applications means removing the derogatory terms, all of them, from the data. Yet, the sentence, “Karen gave the salesperson a hard time about the price marked on the item.” is impossible for the ML application to detect, so it remains in place. If left in place, a large enough selection of data with unfortunate terms poses a security risk because it can skew the results of an analysis or cause an ML application to act in a disastrous manner.

Some terms aren’t even offensive depending on where they’re used. If you use the name Wally in the US, it’s just someone’s name. However, the same name in some other English-speaking countries could mean that the person is stupid or foolish, which is something that you definitely want to remove from your data (see https://www.phrases.org.uk/bulletin_board/46/messages/636.html for details). That’s why the technique in the Developing a simple spam filter example section might prove so helpful. It will at least move suspect data out of the dataset so a human can interpret it when a machine can’t.

Adding human intervention

Humans differ from each other considerably, which is a good thing because being different has helped the human race survive over the years. However, when considering security, being different isn’t always a good thing because the security expert who plans a security strategy has no idea of how other humans in an organization will react to it. Humans can wreck any security strategy, sometimes without much thought, and usually, without ill intent. Simply entering data into a form in a manner never envisioned by the form’s designer can create a problem. Failing to follow procedures or getting bored can cause all sorts of failures and ML applications aren’t exempt from their effects. Users sometimes play games of what will happen if they do something unexpected, possibly hoping to see a software Easter egg.

When creating any security solution, it pays to employ all the stakeholders in an organization in some manner, especially the users who interact with the application and its attendant security on a daily basis. If a user can break your security, it’s not the user’s fault; it’s how the security is implemented that is to blame. In fact, users who break your security are actually providing a service because if they can break your security, a hacker surely will, and it’s unlikely that a hacker will tell you about it.

Developing a simple authentication example

Online ML examples never incorporate any sort of access detection because the author is focusing on showing a programming technique. When creating a production application or an experimental application that uses sensitive data, you need some way to determine the identity of any entities accessing your ML application using authentication. When you authenticate a user, you only determine the user’s identity and nothing else. Before the user can do anything, you must also authorize the user’s activities. You can find the code for the following examples in the MLSec; 04; Authentication and Authorization.ipynb file of the downloadable source code.

Working with basic and digest authentication

There are many ways to accomplish authentication and the techniques used are defined by the following points:

  • The kind of access
  • The type of server
  • The server security setup
  • The application security setup

Here is an easy local application-level security access technique:

import getpass
user = getpass.getuser()
pwd = getpass.getpass("User Name : %s" % user)
if not pwd == 'secret':
    print('Illegal Entry!')
else:
    print('Welcome In!')

The code obtains the user’s name and then asks for a password for that name. Of course, this kind of access only works for a local application. You wouldn’t use it for a web-based application. This version is also simplified because you wouldn’t store the passwords in plain text within the application itself. The password would appear in an encrypted database as a hash value and you’d turn whatever the user types into a hash using the technique shown in the Relying on traditional methods example section of Chapter 2. After you’ve hashed the user’s password, you’d locate the username in the external database, obtain the hash from the database, and compare the user’s hashed password to the hash found in the database.

Online authentication can also follow a simple strategy. Here’s an example of this sort of access:

import requests
from requests.auth import HTTPDigestAuth
resource = 'http://localhost:8888/files/MLSec/Chapter04/TestAccess.txt'
authenticate = HTTPDigestAuth('user', 'pass')
response = requests.get(resource, auth = authenticate)
print(response)

In this case, you use a basic technique to verify access to a particular resource, this one on the local machine through localhost. You build an authentication object consisting of the username and password, and then use it to obtain access to a resource. A response code of 200 indicates success. Most sites use a response code of 401 for a failed authentication, but some sites, such as GitHub, use a 404 response code instead.

Note that this example uses HTTPDigestAuth, which encrypts the username and password before sending it over the network. It’s not the most secure method because it’s vulnerable to a MITM attack but much better than using HTTPBasicAuth for a public API, because basic authentication sends everything in Base64 encoded text. Some security professionals recommend basic authentication for private networks where you can use SSL security, as described at https://mark-kirby.co.uk/2013/how-to-authenticate-apis-http-basic-vs-http-digest/. The request library also supports Open Authentication (OAuth) (see https://pypi.org/project/requests-oauthlib/ for details), Kerberos (see https://github.com/requests/requests-kerberos for details), and Windows NT Lan Manager (NTLM) (see https://github.com/requests/requests-ntlm for details) methodologies.

Accessing GitHub using OAuth-type authentication

Let’s look at the specific example of accessing GitHub, which relies on an OAuth-type access strategy:

To use this example, you must first create an API access token by signing in to your GitHub account and then accessing the https://github.com/settings/tokens page.

After you click Generate New Token, you see a New Personal Access Token page where you provide a token name and decide what access rights the token should provide. For this example, all you really need is repo, package, and user access.

When you are finished with the configuration, click Generate New Token. Make sure you copy the token down immediately because you won’t be able to access it later. (If you make a mistake, you can always delete the old token and create a new one.)

This simple example shows what you need to do to obtain a list of repositories for a given account. However, that’s not really the point of the example. What you’re really looking at is the authentication technique used to access specific resources and the use of GitHub isn’t that pertinent—it could be any API (any API securing a protected resource). Use these steps to create the example:

  1. Import the required libraries:
    import requestsimport json
  2. Obtain the sign-in information. Note that you must replace Your User Name with your actual username and The Token You Generated with the token you created earlier:
    resource = 'https://api.github.com/user/repos'username = 'Your User Name'
    token = 'The Token You Generated'
  3. Create the reusable session object:
    session = requests.Session()session.auth = (username, token)
  4. Request the list of repos for this user:
    repos = json.loads(session.get(resource).text)
  5. Output the repo names:
    for repo in repos:    print(repo['name'])

This code is really an extension of the examples in the previous section. Note that you must supply your username and token (not your GitHub password) before running this example or you’ll see an error. In this case, the code creates a GitHub session, then uses it to obtain a list of repositories owned or accessible by the user from https://api.github.com/user/repos. The example loads the repository information, which includes everything about the repository, not just the name, in JSON format. It then prints a list of names. The names you get depend on the repositories you have set up or shared with other GitHub users. The session object will also allow you to perform tasks such as creating new repositories. The tasks you can perform are limited by the token you generate. You can find extensive documentation about the GitHub REST API at https://docs.github.com/en/rest/overview.

This example demonstrates something else, authorization. Once you authenticate the user, you authorize certain actions by that user, such as by using the get(resource) session call. When generating the GitHub token, you define the actions that the token will allow. One user might be authorized to do everything, while another user might only be able to list the repository content and download files.

Developing a simple spam filter example

Most people associate spam with email and text messages, and you do see ML applications keeping spam away from people all the time. However, for an ML application, spam is any data in any form that you don’t want the application to see. Spam is the sort of information that will cause biased or unusable results because, unlike a human, an ML application doesn’t know to ignore the spam. In most cases, spam is an annoyance rather than a purposeful attempt to attack your model. You can find the code for the following examples in the MLSec_ 04_ Remove Unwanted Text.ipynb file of the downloadable source code.

Starting with a simple removal

When creating a secure input stream for your ML application, you need to think about layers of protection because a hacker is certainly going to pile on layers of attacks. Even if you’ve limited access to your application and its data sources, and provided an ensemble to predictively remove any data source that is most definitely bad, hackers can still try to get data through seemingly useful datasets. Consider the simple text file shown here (also found in TestAccess.txt):

You've gained access to this file.
This is a bad line.
This is another bad line.
This line is good.
And, this line is just sort of OK.
This is yet another bad line for good measure.
You don't want this bad line either.
Finally, this line is great!

Imagine that every line that has the word bad in it really is bad. Perhaps the data includes a script or unwanted values. In fact, perhaps the data just isn’t useful. It’s not necessarily bad, but if you include it in your analysis, the result is biased or perhaps skewed in some way. In short, the line with bad in it is some type of limited spam. It’s not selling you a home in outer whatsit, but it’s not helping your application either. When this sort of issue occurs, you can remove the bad lines and keep the good lines using code similar to that shown in the following steps:

  1. Import the required libraries. When you perform these imports, the Integrated Development Environment (IDE) will tell you that it has downloaded stopwords needed for the example:
    import numpy as npimport os
    import nltk
    nltk.download('stopwords')
    from nltk.corpus import stopwords
    nltk.download('punkt')
    nltk.download('wordnet')
    from collections import Counter
    from sklearn.naive_bayes import MultinomialNB
    from skleyer.metrics import confusion_matrix
  2. Create a function that accepts a filename and a target to remove unwanted lines. This function opens the file and keeps processing it line by line until there are no more lines:
    def Remove_Lines(filename, target_word):    useful_lines = []
        with open(filename) as entries:
            while True:
                line = entries.readline()
                if not line:
                    break
                if not target_word.upper() in line.upper():
                    useful_lines += [line.rstrip()]
        return useful_lines
  3. Define the file and target data to search, then create a list of good entries in the dataset and print them out:
    filename = 'TestAccess.txt'target = 'bad'
    good_data = Remove_Lines(filename, target)
    for entry in good_data:
        print(entry)

There is nothing magic about this code—you’ve used something like it before to process other text files. The difference is that you’re now using a file-processing technique to add security to your data. Notice that you must set both the current word and the target word to uppercase (or lowercase as you like) to ensure the comparison works correctly. Here’s the output from this example:

You've gained access to this file.
This line is good.
And, this line is just sort of OK.
Finally, this line is great!

Notice that all of the lines with the word bad in them are now gone.

Manipulating filtered data

Most people who work with data understand the need to manipulate it in various ways to make the data better suited for analysis. For example, when performing text analysis, one of the first steps is to remove the stop words because they don’t add anything useful to the dataset. Some of these same techniques can help you find patterns in input data so that it becomes harder for a hacker to sneak something in even after you remove the bad elements. For example, you might find odd repetitions of words, number sets, or other data that might normally appear infrequently, if at all, in a dataset that will alert you to potential hacker activity. The following steps show how to create a simple filter that helps you see unusual data or patterns. This code relies on the same libraries you imported in the previous section:

  1. Define a function to remove small words such as “to,” “my,” and “so” from the text:
    def Remove_Stop_Words(data):    stop_words = set(stopwords.words('english'))
        new_lines = []
        for line in data:
            words = line.split()
            filtered = [word for word in words
                        if word.lower() not in stop_words]
            new_lines += [' '.join(filtered)]
        return new_lines
  2. Define a function that will list each word individually, along with the count for that word:
    def Create_Dictionary(data):    all_words = []
        for line in data:
            words = line.split()
            all_words += words
        dictionary = Counter(all_words)
        return dictionary
  3. Define a function that creates a matrix showing word usage:
    def Extract_Features(data, dictionary):    features_matrix = np.zeros(
            (len(data),len(dictionary)))
        lineID = 0
        for line in data:
            words = line.split()
            for word in words:
              wordID = 0
              for i,d in enumerate(dictionary):
                if d == word:
                  wordID = i
                  features_matrix[lineID, wordID] += 1
            lineID += 1
        return features_matrix
  4. Create a filtered list of text strings from the original text that has the stop words removed:
    filtered = Remove_Stop_Words(good_data)print(filtered)
  5. Create a dictionary of words from the filtered list:
    word_dict = Create_Dictionary(filtered)print(word_dict)
  6. Create a matrix showing which words are used and when in each dataset row:
    word_matrix = Extract_Features(filtered, word_dict)print(word_matrix)

Each of the functions in this example shows a progression:

  1. Remove the stop words from each line in the dataset that was created from the original file.
  2. Create a dictionary of important words based on the filtered dataset.
  3. Define a matrix that shows each line of the dataset as rows and the words within that row as columns. A value of 1 indicates that the word appears in the specified row.

There are some interesting bits of code in the example. For example, Remove_Stop_Words() relies on a list comprehension to perform the actual processing. You could also use a for loop if desired. You must also use join() to join the individual words back together and place them in a list to perform additional processing. The output looks like this:

["You've gained access file.", 'line good.', 'And, line
 sort OK.', 'Finally, line great!']

A dictionary is essential for many types of processing. Create_Dictionary() makes use of the Counter() function found in the collections library to make short work of creating the dictionary in a form that will make defining the matrix easy. Here’s the output from this step:

Counter({'line': 3, "You've": 1, 'gained': 1, 'access': 1,
 'file.': 1, 'good.': 1, 'And,': 1, 'sort': 1, 'OK.': 1,
 'Finally,': 1, 'great!': 1})

The output doesn’t appear in any particular order and it’s not necessary that it does. Each unique word in the dataset appears as an individual dictionary key. The values show the number of times that the word appears. Consequently, you could use this output to perform tasks such as determining word frequency. In this case, the example simply creates a matrix to show where the words appear within the dataset. There are possibly shorter ways to perform this task, but the example uses a straightforward approach that processes each word in turn and finds its position in the matrix by enumerating the dictionary. Here’s the output from this step:

[[1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1.]]

If you look at the first row, the first four entries have a 1 in them for You've, gained, access, and file. None of these words appear in the other rows, so the entries are 0 in the other rows. However, line does appear in three of the rows, so there is a 1 for that entry in each of the rows. The next section takes these techniques and shows how to apply them to multiple files in an email dataset.

Creating an email filter

Emails can contain a great deal of useless or harmful information. At one time, email filters worked similarly to the example in the previous section (a simple filter). However, trying to keep track of all of the words that hackers use to get past the filter became impossible. Even though the simple filtering technique is still useful for certain needs, email filtering requires something better—an approach that is flexible enough to change with the techniques that hackers use to attempt to get past the filter. One such approach is to use an ML application to discover which emails are useful and which are spam.

The example in this section performs a simple analysis of the useful (ham) versus spam orientation of each email in the Ling-Spam email corpus described at http://www2.aueb.gr/users/ion/docs/ir_memory_based_antispam_filtering.pdf and available for download at http://www.aueb.gr/users/ion/data/lingspam_public.tar.gz. The original dataset is relatively complex and somewhat unwieldy, so the example actually uses a subset of the messages split into two folders: Email_Train and Email_Test. To save some time and processing, the example relies on the content of the lingspam_publiclingspam_publiclemm_stop folder, which provides the messages with the stop words already processed and the words normalized using lemmatization (see the Choosing between stemming and lemmatization section for details). The messages in the Email_Train folder come from the part1, part2, and part3 folders (867 messages in total with 144 spam messages), while the messages in the Email_Test folder come from the part4 folder (289 messages in total with 48 spam messages). You can tell which messages contain spam because they start with the letters spmsg (for spam message).

Recognizing the benefits of targeted training and testing data

Even though this example uses a generic database, it’s always better to use your organization’s email to train and test any model you create. Doing so will greatly decrease the number of false positives and negatives in the production environment because the data will reflect what the users actually receive. For example, an engineering firm specializing in fluid dynamics can expect to receive a lot more emails about valves than a financial firm will. This same principle holds true for all sorts of other filtering needs. The data from your organization will always provide better results than generic data will. Of course, you need to make sure that any organizational data you use meets privacy requirements and is properly sanitized before you use it, as described in Chapter 13.

Each of the text files contains three lines. The first line is the email subject, the second line is blank, and the third line contains the message. In processing the emails, you look at just the third line with regard to content and know that you can label the training messages as spam if the filename begins with spmsg or ham when the filename begins with something else. With this in mind, the following code shows a spam filter you can create using techniques similar to those used in the previous section but using multiple files in this case. This code relies on the same libraries you imported in the Starting with a simple removal section (make sure you use the 1.0.x version, originally version 0.23.x, of scikit-learn, as described in Chapter 1, for this part of the chapter or you may encounter errors):

  1. Set the paths for the training and testing messages:
    train_path = "Email_Train"train_emails = 
        [os.path.join(train_path,f) for f
         in os.listdir(train_path)]
    test_path = "Email_Test"
    test_emails = 
        [os.path.join(test_path,f) for f
         in os.listdir(test_path)]
  2. Create a dictionary function to build the required dictionary. Then, remove non-word items that include numbers, special characters, end-of-line characters, and so on:
    def Create_Mail_Dictionary(emails):    cvec = CountVectorizer(
            stop_words='english',
            token_pattern=r'[a-zA-Z]{2,}',
            max_features=2000)
        corpus = [open(email).read() for email in emails]
        cvec.fit(corpus)
        return cvec
    train_cvec = Create_Mail_Dictionary(train_emails)
  3. Create a features matrix function. Instead of lines and words, this code uses documents and words for the matrix:
    def Extract_Mail_Features(emails, cvec):    corpus = [open(email).read() for email in emails]
        return cvec.transform(corpus)
    train_feat = Extract_Mail_Features(train_emails,  
        train_cvec)
    test_feat = Extract_Mail_Features(test_emails, 
        train_cvec)
  4. Create labels showing which messages are ham (0) and spam (1):
    train_labels = np.zeros(867)train_labels[723:867] = 1
    test_labels = np.zeros(289)
    test_labels[241:289] = 1
  5. Train the Multinomial Naïve Bayes model:
    MNB = MultinomialNB()MNB.fit(train_feat, train_labels)
  6. Predict which of the messages in the test group are ham or spam and output the correctness of the prediction as a confusion matrix:
    result = MNB.predict(test_feat)print(confusion_matrix(test_labels, result))
  7. Display the confusion matrix in a nicely plotted form:
    matrix = plot_confusion_matrix(MNB,                               X=test_feat,
                                   y_true=test_labels,
                                   cmap=plt.cm.Blues)
    plt.title('Confusion matrix for spam classifier')
    plt.show(matrix)
    plt.show()

The listing shows that simple techniques often provide the basis for more complex processing. The Create_Mail_Dictionary() and Extract_Mail_Features() functions provide the ability to work with multiple files and to provide additional data cleaning. Notice that this example uses a more efficient method of creating the dictionary using scikit-learn CountVectorizer(). The concept and the result are the same as what you see in the previous section, but this approach is shorter and more efficient. The Extract_Mail_Features() function is also made shorter by using list comprehensions in addition to calling the cvec.transform() function on the resulting corpus. Again, the output is the same and the process is the same under the covers, but you’re using a more efficient approach.

The Multinomial Naïve Bayes model will vary in its ability to correctly predict ham or spam messages after you fit it to the training data. In this case, the result shows that there are 241 ham messages and 48 spam messages in the test dataset. A larger test dataset is likely to show a less impressive result, but according to Machine learning for email spam filtering: review, approaches and open research problems at https://www.sciencedirect.com/science/article/pii/S2405844018353404, some companies, such as Google, have achieved rates as high as 99.9 percent. In this case, however, the companies use advanced ML strategies, rather than the more basic Multinomial Naïve Bayes model. In addition, the strategies rely on ensembles of learners as suggested in the Using ensemble learning section of Chapter 3.

Choosing between stemming and lemmatization

There are two common techniques for normalizing words within documents: stemming and lemmatization. Each has its uses. Stemming simply removes the prefixes and suffixes of words to normalize the root word. For example, player, plays, and playing would all be stemmed from the root word play. This technique is mostly used for word analysis, such as determining how often particular words appear in one or more documents. Lemmatization processes the words in context, so that the words running, runs, and ran all appear as the root word run. You use this technique most often for text analysis, such as determining the relationships of words in a spam message versus a usable (ham) message. Here is an example of stemming:

from nltk.stem import LancasterStemmer
from nltk.tokenize import word_tokenize
LS = LancasterStemmer()
print(LS.stem("player"))
print(LS.stem("plays"))
print(LS.stem("playing"))
tokens = word_tokenize("Gary played the player piano while playing cards.")
stemmed = [LS.stem(word) for word in tokens]
print(" ".join(stemmed))

The example imports the required libraries, creates an instance of LancasterStemmer() and then uses the instance to stem three words with the same root. It then does the same thing for a sentence containing the three words. The output shows that context isn’t taken into account and it’s possible to end up with some non-words:

play
play
play
gary play the play piano whil play card .

Lemmatization takes a different route, as shown in this example (note that you may have to add the nltk.download('omw-1.4') statement after the import statement if you see an error message after running this code):

from nltk.stem import WordNetLemmatizer
WNL = WordNetLemmatizer()
print(WNL.lemmatize("player", pos="v"))
print(WNL.lemmatize("plays", pos="v"))
print(WNL.lemmatize("playing", pos="v"))
tokens = word_tokenize("Gary played the player piano while playing cards.")
lemmatized = [WNL.lemmatize(word, pos="v") for word in tokens]
print(" ".join(lemmatized))

Notice the pos argument in the lemmatize() calls. This argument provides the context for performing the task and can be any of these values: adjective (a), satellite adjective (s), adverb (r), noun (n), and verb (v). In choosing verbs, the example provides this output, which you can contrast with stemming:

player
play
play
Gary play the player piano while play card .

The point is that you must choose carefully between stemming and lemmatization when creating filters for your ML application. Choosing the right process will result in significantly better results in most cases.

Summary

This chapter helped you understand both business and social threats to your ML application, what to look for, how to mitigate attacks when they occur, and how to keep them from happening in the first place. The goal is to provide a flexible setup that makes the hacker work so hard that going somewhere else becomes attractive. Never assume that the hacker can’t break your security. In fact, presenting any sort of challenge will keep a hacker interested until your security does break, so always assume that any security threat can gain access if wanted.

Layering is an essential part of any security solution. Using layers adds complexity, which is a double-edged sword. On the one hand, it makes the hacker’s job harder by putting up barriers that change over time, as administrators learn and correct misconceptions about how security should appear. On the other hand, as anyone who does reliability studies will tell you, more parts mean more things to break, which reduces the reliability of the setup being protected. Consequently, more layers are good, but more layers than you actually need only makes your system unreliable.

Thinking about complexity, the next chapter will zoom in on the network itself. Most hackers are after your network, not an individual machine. Given that users generally have at least two systems they use to access ML applications, infecting just one machine likely isn’t enough to provide the hacker with a carte blanche to enter your application. Keeping your network clean is a requirement if you want to keep your ML application safe and you need to consider both the local network and the network in the cloud.

Further reading

The following links provide you with some additional reading that you may find useful to further understand the materials in this chapter:

  • This link helps you discover more about the ML component of a SageMaker application: Building secure machine learning environments with Amazon SageMaker:

https://aws.amazon.com/blogs/machine-learning/building-secure-machine-learning-environments-with-amazon-sagemaker/

https://kth.diva-portal.org/smash/get/diva2:1117695/FULLTEXT01.pdf

  • Learn some additional detail on the carding attack type: How to Use AI and Machine Learning in Fraud Detection:

https://spd.group/machine-learning/fraud-detection-with-machine-learning/

https://www.ibm.com/cloud/learn/confidential-computing

https://www.hindawi.com/journals/je/2020/5267564/

Attack Type

Major Consideration

Possible Remedy

Resources

Whitepaper/Example

File-path traversal

A hacker gains access to sensitive data (such as data used to train a model) by using specially configured paths. These paths either rely on the inherent weaknesses of relative paths that use the ../../ notation or known absolute paths. Once the hacker gains access to the directory, it’s possible to look at the config file settings, corrupt data, or perform other malicious permanent storage modifications.

Ensure that every resource always has the required protection (see the Understanding the kinds of application security section in this chapter) and that any user access relies on the principle of least privilege. It’s also possible to rely on special filtering of input data and API requests using a technique similar to that shown in the Manipulating filtered data section and pattern detection using the technique in the Creating an email filter section.

https://www.geeksforgeeks.org/path-traversal-attack-prevention/ and https://www.trendmicro.com/en_us/research/20/j/contentprovider-path-traveral-flaw-on-esc-app-reveals-info.html

https://jisajournal.springeropen.com/articles/10.1186/s13174-019-0115-x

Distributed Denial-of
-Service (DDoS)

Packets of useless data and commands are sent from a group of systems under the hacker’s control to overwhelm the victim’s system and cause it to fail. Given that ML applications often require large amounts of network bandwidth, this class of application is inordinately affected by a DDoS attack.

Most methods today rely on detecting the attack and dealing with it on the victim’s system. One proposed solution is to detect the attack from the source (such as the hacker’s control machine or the various bots) using ML techniques and then cut off those attackers from the inputs to the victim.

https://ieeexplore.ieee.org/document/7013133 and https://www.mdpi.com/2504-3900/63/1/51/pdf

http://palms.princeton.edu/system/files/Machine_Learning_Based_DDoS_Attack_Detection_From_Source_Side_in_Cloud_camera_ready.pdf

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset