Summary

In recent years, GitHub has emerged as the most well-known coding platform. With 20 million users and 57 million repositories, it's the most extensively used version control system. The social networking features provided by the platform for code repositories is also termed Social Coding. The API provided by GitHub allows you to perform an interesting analysis on data. Using the GitHub API, we used a combination of textual descriptions and other numerical data to figure out the latest trending technologies. The numerical data we extracted included watchers, open issues, forks, and repository size. Text analytics of the description using bi-grams gave an interesting list of technologies that are the most popular, mainly Artificial Intelligence technologies like deep learning and TensorFlow, and also a mix of open source projects and various diverse repositories related to software engineering. Analysis of the programming languages for these repositories showed us that Python and SWIFT have been growing in popularity in the last few years. Finally, we tried to find correlations using scatterplots between the numerical variables (size, forks, open issues, and watchers). The conclusion was that the variables are not really correlated and vary from project to project.

In the next chapter, we will explore the exciting world of online forums, a rich and engaged source of conversations on all kinds of topics. The goal of the chapter will be to apply topic modeling, an advanced machine learning technique to summarize and extract topics automatically from huge amounts of textual conversations data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset