Let's begin by looking at the number of images included with each story. We'll run a value count and then plot the numbers:
dfc['img_count'].value_counts().to_frame('count')
This should display an output similar to the following:
Now, let's plot that same information:
fig, ax = plt.subplots(figsize=(8,6)) y = dfc['img_count'].value_counts().sort_index() x = y.sort_index().index plt.bar(x, y, color='k', align='center') plt.title('Image Count Frequency', fontsize=16, y=1.01) ax.set_xlim(-.5,5.5) ax.set_ylabel('Count') ax.set_xlabel('Number of Images')
This code generates the following output:
Already, I'm surprised by the numbers. The vast majority of stories have five pictures in them, while those stories that have either one or no pictures at all are quite rare.
Hence, we can see that people tend to share content with lots of images. Now, let's take a look at the most common colors in those images:
mci = dfc['main_hex'].value_counts().to_frame('count') mci
This code generates the following output:
I don't know about you, but this isn't extremely helpful given that I don't see hex values as colors. We can, however, use a new feature in pandas called conditional formatting to help us out:
mci['color'] = ' ' def color_cells(x): return 'background-color: ' + x.index mci.style.apply(color_cells, subset=['color'], axis=0) mci
The preceding code generates the following output: