Exploring image data

Let's begin by looking at the number of images included with each story. We'll run a value count and then plot the numbers:

dfc['img_count'].value_counts().to_frame('count')

This should display an output similar to the following:

Now, let's plot that same information:

fig, ax = plt.subplots(figsize=(8,6)) 
y = dfc['img_count'].value_counts().sort_index() 
x = y.sort_index().index 
plt.bar(x, y, color='k', align='center') 
plt.title('Image Count Frequency', fontsize=16, y=1.01) 
ax.set_xlim(-.5,5.5) 
ax.set_ylabel('Count') 
ax.set_xlabel('Number of Images')

This code generates the following output:

Already, I'm surprised by the numbers. The vast majority of stories have five pictures in them, while those stories that have either one or no pictures at all are quite rare.

Hence, we can see that people tend to share content with lots of images. Now, let's take a look at the most common colors in those images:

mci = dfc['main_hex'].value_counts().to_frame('count') 
 
mci

This code generates the following output:

I don't know about you, but this isn't extremely helpful given that I don't see hex values as colors. We can, however, use a new feature in pandas called conditional formatting to help us out:

mci['color'] = ' ' 
 
def color_cells(x): 
    return 'background-color: ' + x.index 
 
mci.style.apply(color_cells, subset=['color'], axis=0) 
 
mci

The preceding code generates the following output:

Table of Contents for Exploring image data

Create new playlist

Sign In

Sign Up

Table of Contents for
Exploring image data