Exploring the GitHub world

In order to get a better understanding on how to operate with the GitHub API, we will go through the following steps:

  1. Install the GitHub Python library.
  2. Access the API by using the token provided when we registered in the developer website.
  3. Retrieve some key facts on the Apache foundation that is hosting the spark repository.

Let's go through the process step-by-step:

  1. Install the Python PyGithub library. In order to install it, you need to pip install PyGithub from the command line:
    pip install PyGithub
  2. Programmatically create a client to instantiate the GitHub API:
    from github import Github
    
    # Get your own access token
    
    ACCESS_TOKEN = 'Get_Your_Own_Access_Token'
    
    # We are focusing our attention to User = apache and Repo = spark
    
    USER = 'apache'
    REPO = 'spark'
    
    g = Github(ACCESS_TOKEN, per_page=100)
    user = g.get_user(USER)
    repo = user.get_repo(REPO)
  3. Retrieve key facts from the Apache User. There are 640 active Apache repositories in GitHub:
    repos_apache = [repo.name for repo in g.get_user('apache').get_repos()]
    len(repos_apache)
    640
  4. Retrieve key facts from the Spark repository, The programing languages used in the Spark repo are given here under:
    pp(repo.get_languages())
    
    {u'C': 1493,
     u'CSS': 4472,
     u'Groff': 5379,
     u'Java': 1054894,
     u'JavaScript': 21569,
     u'Makefile': 7771,
     u'Python': 1091048,
     u'R': 339201,
     u'Scala': 10249122,
     u'Shell': 172244}
  5. Retrieve a few key participants of the wide Spark GitHub repository network. There are 3,738 stargazers in the Apache Spark repository at the time of writing. The network is immense. The first stargazer is Matei Zaharia, the cofounder of the Spark project when he was doing his PhD in Berkeley.
    stargazers = [ s for s in repo.get_stargazers() ]
    print "Number of stargazers", len(stargazers)
    Number of stargazers 3738
    
    [stargazers[i].login for i in range (0,20)]
    [u'mateiz',
     u'beyang',
     u'abo',
     u'CodingCat',
     u'andy327',
     u'CrazyJvm',
     u'jyotiska',
     u'BaiGang',
     u'sundstei',
     u'dianacarroll',
     u'ybotco',
     u'xelax',
     u'prabeesh',
     u'invkrh',
     u'bedla',
     u'nadesai',
     u'pcpratts',
     u'narkisr',
     u'Honghe',
     u'Jacke']

Understanding the community through Meetup

In order to get a better understanding of how to operate with the Meetup API, we will go through the following steps:

  1. Create a Python program to call the Meetup API using an authentication token.
  2. Retrieve information of past events for meetup groups such as London Data Science.
  3. Retrieve the profile of the meetup members in order to analyze their participation in similar meetup groups.

Let's go through the process step-by-step:

  1. As there is no reliable Meetup API Python library, we will programmatically create a client to instantiate the Meetup API:
    import json
    import mimeparse
    import requests
    import urllib
    from pprint import pprint as pp
    
    MEETUP_API_HOST = 'https://api.meetup.com'
    EVENTS_URL = MEETUP_API_HOST + '/2/events.json'
    MEMBERS_URL = MEETUP_API_HOST + '/2/members.json'
    GROUPS_URL = MEETUP_API_HOST + '/2/groups.json'
    RSVPS_URL = MEETUP_API_HOST + '/2/rsvps.json'
    PHOTOS_URL = MEETUP_API_HOST + '/2/photos.json'
    GROUP_URLNAME = 'London-Machine-Learning-Meetup'
    # GROUP_URLNAME = 'London-Machine-Learning-Meetup' # 'Data-Science-London'
    
    class Mee
    tupAPI(object):
        """
        Retrieves information about meetup.com
        """
        def __init__(self, api_key, num_past_events=10, http_timeout=1,
                     http_retries=2):
            """
            Create a new instance of MeetupAPI
            """
            self._api_key = api_key
            self._http_timeout = http_timeout
            self._http_retries = http_retries
            self._num_past_events = num_past_events
    
        def get_past_events(self):
            """
            Get past meetup events for a given meetup group
            """
            params = {'key': self._api_key,
                      'group_urlname': GROUP_URLNAME,
                      'status': 'past',
                      'desc': 'true'}
            if self._num_past_events:
                params['page'] = str(self._num_past_events)
    
            query = urllib.urlencode(params)
            url = '{0}?{1}'.format(EVENTS_URL, query)
            response = requests.get(url, timeout=self._http_timeout)
            data = response.json()['results']
            return data
    
        def get_members(self):
            """
            Get meetup members for a given meetup group
            """
            params = {'key': self._api_key,
                      'group_urlname': GROUP_URLNAME,
                      'offset': '0',
                      'format': 'json',
                      'page': '100',
                      'order': 'name'}
            query = urllib.urlencode(params)
            url = '{0}?{1}'.format(MEMBERS_URL, query)
            response = requests.get(url, timeout=self._http_timeout)
            data = response.json()['results']
            return data
    
        def get_groups_by_member(self, member_id='38680722'):
            """
            Get meetup groups for a given meetup member
            """
            params = {'key': self._api_key,
                      'member_id': member_id,
                      'offset': '0',
                      'format': 'json',
                      'page': '100',
                      'order': 'id'}
            query = urllib.urlencode(params)
            url = '{0}?{1}'.format(GROUPS_URL, query)
            response = requests.get(url, timeout=self._http_timeout)
            data = response.json()['results']
            return data
  2. Then, we will retrieve past events from a given Meetup group:
    m = MeetupAPI(api_key='Get_Your_Own_Key')
    last_meetups = m.get_past_events()
    pp(last_meetups[5])
    
    {u'created': 1401809093000,
     u'description': u"<p>We are hosting a joint meetup between Spark London and Machine Learning London. Given the excitement in the machine learning community around Spark at the moment a joint meetup is in order!</p> <p>Michael Armbrust from the Apache Spark core team will be flying over from the States to give us a talk in person.xa0Thanks to our sponsors, Cloudera, MapR and Databricks for helping make this happen.</p> <p>The first part of the talk will be about MLlib, the machine learning library for Spark,xa0and the second part, onxa0Spark SQL.</p> <p>Don't sign up if you have already signed up on the Spark London page though!</p> <p>
    
    
    Abstract for part one:</p> <p>In this talk, weu2019ll introduce Spark and show how to use it to build fast, end-to-end machine learning workflows. Using Sparku2019s high-level API, we can process raw data with familiar libraries in Java, Scala or Python (e.g. NumPy) to extract the features for machine learning. Then, using MLlib, its built-in machine learning library, we can run scalable versions of popular algorithms. Weu2019ll also cover upcoming development work including new built-in algorithms and R bindings.</p> <p>
    
    
    
    Abstract for part two:xa0</p> <p>In this talk, we'll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query external data with complex analytics. In addition to Spark SQL, we'll explore the Catalyst optimizer framework, which allows Spark SQL to automatically rewrite query plans to execute more efficiently.</p>",
     u'event_url': u'http://www.meetup.com/London-Machine-Learning-Meetup/events/186883262/',
     u'group': {u'created': 1322826414000,
                u'group_lat': 51.52000045776367,
                u'group_lon': -0.18000000715255737,
                u'id': 2894492,
                u'join_mode': u'open',
                u'name': u'London Machine Learning Meetup',
                u'urlname': u'London-Machine-Learning-Meetup',
                u'who': u'Machine Learning Enthusiasts'},
     u'headcount': 0,
     u'id': u'186883262',
     u'maybe_rsvp_count': 0,
     u'name': u'Joint Spark London and Machine Learning Meetup',
     u'rating': {u'average': 4.800000190734863, u'count': 5},
     u'rsvp_limit': 70,
     u'status': u'past',
     u'time': 1403200800000,
     u'updated': 1403450844000,
     u'utc_offset': 3600000,
     u'venue': {u'address_1': u'12 Errol St, London',
                u'city': u'EC1Y 8LX',
                u'country': u'gb',
                u'id': 19504802,
                u'lat': 51.522533,
                u'lon': -0.090934,
                u'name': u'Royal Statistical Society',
                u'repinned': False},
     u'visibility': u'public',
     u'waitlist_count': 84,
     u'yes_rsvp_count': 70}
  3. Get information about the Meetup members:
    members = m.get_members()
    
    {u'city': u'London',
      u'country': u'gb',
      u'hometown': u'London',
      u'id': 11337881,
      u'joined': 1421418896000,
      u'lat': 51.53,
      u'link': u'http://www.meetup.com/members/11337881',
      u'lon': -0.09,
      u'name': u'Abhishek Shivkumar',
      u'other_services': {u'twitter': {u'identifier': u'@abhisemweb'}},
      u'photo': {u'highres_link': u'http://photos3.meetupstatic.com/photos/member/9/6/f/3/highres_10898643.jpeg',
                 u'photo_id': 10898643,
                 u'photo_link': u'http://photos3.meetupstatic.com/photos/member/9/6/f/3/member_10898643.jpeg',
                 u'thumb_link': u'http://photos3.meetupstatic.com/photos/member/9/6/f/3/thumb_10898643.jpeg'},
      u'self': {u'common': {}},
      u'state': u'17',
      u'status': u'active',
      u'topics': [{u'id': 1372, u'name': u'Semantic Web', u'urlkey': u'semweb'},
                  {u'id': 1512, u'name': u'XML', u'urlkey': u'xml'},
                  {u'id': 49585,
                   u'name': u'Semantic Social Networks',
                   u'urlkey': u'semantic-social-networks'},
                  {u'id': 24553,
                   u'name': u'Natural Language Processing',
    ...(snip)...
                   u'name': u'Android Development',
                   u'urlkey': u'android-developers'}],
      u'visited': 1429281599000}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset