In R as well as in pandas, there is more than one way to perform logical subsetting. Suppose that we wished to display all players with the average goals per game ratio of greater than or equal to 0.5; that is, they average at least one goal every two games.
Here's how we can do this in R:
>goal_stats[goal_stats$GoalsPerGame>=0.5,] Club Player Goals GamesPlayedGoalsPerGame 1 Atletico Madrid Diego Costa 8 9 0.8888889 6 Real Madrid Cristiano Ronaldo 17 11 1.5454545 7 Real Madrid Gareth Bale 6 12 0.5000000 17 Chelsea Demba Ba 3 6 0.5000000
subset()
function:>subset(goal_stats,GoalsPerGame>=0.5) Club Player Goals GamesPlayedGoalsPerGame 1 Atletico Madrid Diego Costa 8 9 0.8888889 6 Real Madrid Cristiano Ronaldo 17 11 1.5454545 7 Real Madrid Gareth Bale 6 12 0.5000000 17 Chelsea Demba Ba 3 6 0.5000000
In pandas, we also do something similar:
In [33]: goal_stats_df[goal_stats_df['GoalsPerGame']>=0.5] Out[33]: Club Player Goals GamesPlayedGoalsPerGame 0 Atletico Madrid Diego Costa 8 9 0.888889 5 Real Madrid Cristiano Ronaldo 17 11 1.545455 6 Real Madrid Gareth Bale 6 12 0.500000 16 Chelsea Demba Ba 3 6 0.500000
DataFrame.query()
operator:In [36]: goal_stats_df.query('GoalsPerGame>= 0.5') Out[36]: Club Player Goals GamesPlayedGoalsPerGame 0 Atletico Madrid Diego Costa 8 9 0.888889 5 Real Madrid Cristiano Ronaldo 17 11 1.545455 6 Real Madrid Gareth Bale 6 12 0.500000 16 Chelsea Demba Ba 3 6 0.500000