Assigning advertisements to keywords using the Adwords balance algorithm

Advertisements have become a major medium of revenue for the Web. It is a billion-dollar business, and the source of the most Google revenue. Further, it has made it possible for companies such as Google to run their main services free of cost, while collecting their revenue through advertisements.

Let us explore how we can implement a simple "Adwords" style algorithm using MapReduce.

Adwords lets people bid for keywords. For example, the advertiser "A" can bid for keywords "Hadoop Support" for 2$ and provided a maximum budget of 100$, and the advertiser "B" would bid for keywords "Hadoop Support" for 1.50$ and provided a maximum budget of 200$. When a user searches for a document with given keywords, the system will choose one or more advertisements among the bids for these keywords. Advertisers will pay only if a user clicks on the advertisement.

Adwords problem is to show advertisements such that it will maximize revenue. There are several factors in the play while designing such a solution:

  • Only user clicks, not showing the advertisement, will get us money. Hence, we want to show advertisements that are more likely to be clicked often. We measure this as fraction of time an advertisement was clicked as oppose to how many times it was shown. We call this "click-through rate" for a keyword.
  • We want to show people with large budgets, as those are likely be ones that are hard to spend as opposed to smaller budgets.

In this recipe, we will implement a simplified version of the Adwords balance algorithm that can be used in such situations. For simplicity, we will assume that advertisers only bid on single words. Also, as we cannot find a real bid dataset, we will generate a sample bid dataset.

Assigning advertisements to keywords using the Adwords balance algorithm

Assume that you are to support a keyword-based advertisement using the Amazon dataset. The recipe will work as follows:

  • The first MapReduce task will approximate the click-through rate of the keyword using Amazon sales index. Here, we assume that the keywords that are found in the title of the products with higher sales rank will have a better click-through rate.
  • Then we will run a Java task to generate a bid dataset.
  • Then the second MapReduce task will group bids for the same product together and create an output that is appropriate to be used by advertisement assignment program.
  • Finally, we will use an advertisement assignment program to assign keywords to advertisers. We would use Adword balance algorithm, which uses the following formula. The following formula assigns priority based on the fraction of unspent budget owned by each advertiser, bid value, and click-through rate.

    Measure = bid value * click-through rate * (1-e^(-1*current budget/ initial budget))

Getting ready

The following steps describe how to prepare to run the Adwords sample:

  1. This assumes that you have followed Chapter 1, Getting Hadoop up and running in a Cluster and have installed Hadoop. We will use HADOOP_HOME to refer to Hadoop installation directory.
  2. Start Hadoop by following the instructions in Chapter 1, Getting Hadoop up and running in a Cluster.
  3. This recipe assumes you are aware of how Hadoop processing works. If you have not already done so, you should follow the Writing the WordCount MapReduce sample, bundling it and running it using standalone Hadoop recipe from Chapter 1, Getting Hadoop up and running in a Cluster.

How to do it...

The following steps describe how to run the Adwords sample:

  1. Download the dataset from Amazon product co-purchasing network metadata, http://snap.stanford.edu/data/amazon-meta.html and unzip it. We call this DATA_DIR.
  2. Upload the data to HDFS by running following commands from HADOOP_HOME. If the /data directory is already there, clean it up. This dataset is large, and might take a long time if you try to run it with a single computer. You might want to only upload the first 50,000 lines or so of the dataset if you need the sample to run quickly.
    $ bin/hadoopdfs -mkdir /data
    $ bin/hadoopdfs -mkdir /data/input1
    $ bin/hadoopdfs -put <DATA_DIR>/amazon-meta.txt /data/input1
    
  3. Unzip the source code for Chapter 8 (chapter8.zip). We will call that folder CHAPTER_8_SRC.
  4. Change the hadoop.home property in the CHAPTER_8_SRC/build.xml file to point to your Hadoop installation directory.
  5. Compile the source by running the ant build command from the CHAPTER_8_SRC directory.
  6. Copy the build/lib/hadoop-cookbook-chapter8.jar to your HADOOP_HOME.
  7. Run the Map reduce job through the following command from HADOOP_HOME.
    $ bin/hadoopjar hadoop-cookbook-chapter8.jar chapter8.adwords.ClickRateApproximator/data/input1 /data/output6
    
  8. Download the results to your computer by running the following command:
    $ bin/hadoopdfs -get /data/output6/part-r-00000clickrate.data
    
  9. You will see that file contains the results as following. You can use these values with Bayes classifier to classify the inputs.
    keyword:(Annals 74
    keyword:(Audio  153
    keyword:(BET    95
    keyword:(Beat   98
    keyword:(Beginners)     429
    keyword:(Beginning      110
  10. Generate a bid dataset by running the following command from HADOOP_HOME. You can find the results in a biddata.data file.
    $ java -cp build/lib/hadoop-cookbook-chapter8.jar chapter8.adwords.AdwordsBidGenerator clickrate.data
    
  11. Create a directory called /data/input2 and upload the bid dataset and results from the earlier MapReduce task to the /data/input2 directory of HDFS.
    $ bin/hadoopdfs -put clickrate.data /data/input2
    $ bin/hadoopdfs -put biddata.data /data/input2
    
  12. Generate the data to be used by Adwords dataset by running the second MapReduce job.
    $ bin/hadoopjar hadoop-cookbook-chapter8.jar chapter8.adwords.AdwordsBalanceAlgorithmDataGenerator/data/input2 /data/output7
    
  13. Download the results to your computer by running the following command:
    $ bin/hadoopdfs -get /data/output7/part-r-00000adwords.data
    
  14. You will see that it will print the results as follows:
    (Animated       client23,773.0,5.0,97.0|
    (Animated)      client33,310.0,8.0,90.0|
    (Annals         client76,443.0,13.0,74.0|
    client51,1951.0,4.0,74.0|
    (Beginners)     client86,210.0,6.0,429.0|
    client6,236.0,5.0,429.0|
    (Beginning      client31,23.0,10.0,110.0|
    
  15. Perform the matches for a random set of keywords by running the following command:
    $ javajar hadoop-cookbook-chapter8.jar chapter8.adwords.AdwordsAssigneradwords.data
    

How it works...

As we discussed in the How to do it... section, the recipe consists of two MapReduce tasks. You can find the source code for the first MapReduce task from src/chapter8/adwords/ClickRateApproximator.java.

The mapper function looks like the following. It parses the Amazon dataset using the Amazon data format, and for each word in each product title, it emits the word and the sales ranks of that product.

public void map(Object key, Text value, Context context) 
{
  ItemDataitemData = null; 
  List<AmazonCustomer>customerList = 
    AmazonCustomer.parseAItemLine(value.toString());
  if(customerList.size() == 0)
  {
    return;
  }
  for (AmazonCustomer customer : customerList) 
  {
    itemData = customer.itemsBrought.iterator().next();
    break;
  }


  String[] tokens = itemData.title.split("\s");
  for(String token: tokens)
  {
    if(token.length() > 3)
    {
      context.write(new Text(token), new IntWritable(itemData.salesrank));
    }
  }
}

Then, Hadoop sorts the emitted key-value pairs by keys and invokes the reducer once for each key passing the values emitted against that key. As shown in the following code, the reducer calculates an approximation for the click rate using sales ranks emitted against the key.

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException 
{
  doubleclickrate = 0; 
  for(IntWritableval: values)
  {
    if(val.get() > 1)
    {
      clickrate = clickrate + 1000/Math.log(val.get());
    }
    else
    {
      clickrate = clickrate + 1000;
    }
  }
  context.write(new Text("keyword:" +key.toString()), 
    newIntWritable((int)clickrate)); 
}

There is no publicly available bid dataset. Therefore, we will generate a random bid data set for our recipe using AdwordsBidGenerator. It would read the keywords generated by the earlier recipe and generate a random bid dataset.

Then we will use the second MapReduce task to merge the bid data set with click-through rate and generate a dataset that has bids' information sorted against the keyword. You can find the source for the second MapReduce task from src/chapter8/adwords/AdwordsBalanceAlgorithmDataGenerator.java. The mapper function looks like the following:

public void map(Object key, Text value, Context context) throws IOException, InterruptedException 
{
  String[] keyVal = value.toString().split("\s");
  if (keyVal[0].startsWith("keyword:")) 
  {
    context.write(
      new Text(keyVal[0].replace("keyword:", "")), 
      new Text(keyVal[1]));
  }
  else if (keyVal[0].startsWith("client")) 
  {
    List<String[]> bids = new ArrayList<String[]>();
    double budget = 0;
    String clientid = keyVal[0];
    String[] tokens = keyVal[1].split(",");
    for (String token : tokens) 
    {
      String[] kp = token.split("=");
      if (kp[0].equals("budget")) 
      {
        budget = Double.parseDouble(kp[1]);
      }
      else if (kp[0].equals("bid")) 
      {
        String[] bidData = kp[1].split("\|");
        bids.add(bidData);
      }
    }

    for (String[] bid : bids) 
    {
      String keyword = bid[0];
      String bidValue = bid[1];
      context.write(new Text(keyword), 
        new Text(new StringBuffer()
        .append(clientid).append(",")
      .append(budget).append(",")
      .append(bidValue).toString()));
    }
  }
}

The mapper function reads both the bid data set and click-through rate datasets and emits both types of data against the keyword. Then, each reducer receives all bids and associated click-through data for each keyword. Then the reducer merges the data and emits a list of bids against each keyword.

public void reduce(Text key, Iterable<Text> values, 
  Context context) throws IOException, InterruptedException 
{
  String clientid = null;
  String budget = null;
  String bid = null;
  String clickRate = null;

  List<String>bids = new ArrayList<String>();
  for (Text val : values) 
  {
    if (val.toString().indexOf(",") > 0) 
    {
      bids.add(val.toString());
    }
    else
      {
        clickRate = val.toString();
      }
    }
    StringBufferbuf = new StringBuffer();
    for (String bidData : bids) 
    {
      String[] vals = bidData.split(",");
      clientid = vals[0];
      budget = vals[1];
      bid = vals[2];
      buf.append(clientid).append(",")
       .append(budget).append(",")
       .append(Double.valueOf(bid)).append(",")
       .append(Math.max(1, Double.valueOf(clickRate)));
      buf.append("|");
    }
    if (bids.size() > 0) 
    {
      context.write(key, new Text(buf.toString()));
    }
}

Finally, the Adwords assigner loads the bids data and stores them against the keywords in the memory. Given a keyword, the Adwords assigner finds the bid that has maximum value for the following equation and selects a bid among all the bids for advertisement:

Measure = bid value * click-through rate * (1-e^(-1*current budget/ initial budget))

There's more...

The preceding recipe assumes that Adwords assigner can load all the data in the memory to make advertisements assignment decisions. However, if the dataset is big, we can partition the dataset among multiple computers by keywords (for example, assign keywords that start with "A-D" to the first computer and so on).

This recipe assumes that users only bid for single words. However, to support multiple keyword bids, we would need to combine the click-through rates, and the rest of the algorithm can proceed as before.

More information about online advertisement can be found from the book, Mining of Massive Datasets, by Anand Rajaraman and Jeffrey D. Ullman. This book can be found at http://infolab.stanford.edu/~ullman/mmds.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset