We had briefly mentioned select N+1 problem in the previous chapter. Select N+1 is bad from both performance and memory consumption point of view. We are going to discuss how we can avoid this problem. Before we get our hands dirty, let's spend some time trying to understand what is select N+1 problem.
It is easier to understand select N+1 problem if we read it in reverse – 1+N select. That is right. Let's see how.
Suppose you want to load all employees living in London and then for every employee iterate through the benefits that they are getting. Following is one way of doing that using a LINQ query:
[Test] public void WithSelectNPlusOneIssue() { using (var transaction = Database.Session.BeginTransaction()) { var employees = Database.Session.Query<Employee>() .Where(e => e.ResidentialAddress.City == "London"); foreach (var employee in employees) { foreach (var benefit in employee.Benefits) { Assert.That(benefit.Employee, Is.Not.Null); } } transaction.Commit(); } }
Let me state one thing about the preceding code before we continue with our discussion. I have inserted a not so meaningful assert in there because I needed some work to be done on each benefit
instance in the Benefits
collection. In real life, this could be some business operation that needs to be performed on the loaded benefit
object. With that clear, let's get back to our discussion.
If you notice, the Benefits
collection is lazily loaded as we iterate through the loaded employee instances. When the previous code is run, we get the following SQL. This is a stripped down version only showing the parts relevant to this discussion.
SELECT employee0_.id AS id0_, //...other columns...// FROM employee employee0_ INNER JOIN address address1_ ON employee0_.id=address1_.employee_id WHERE address1_.city=@p0; @p0 = 'London' [Type: String (0)] SELECT benefits0_.employee_id AS employee4_1_, benefits0_.id AS id1_, //...other columns...// FROM benefit benefits0_ LEFT OUTER JOIN leave benefits0_1_ ON benefits0_.id=benefits0_1_.id LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ ON benefits0_.id=benefits0_2_.id LEFT OUTER JOIN seasonticketloan benefits0_3_ ON benefits0_.id=benefits0_3_.id WHERE benefits0_.employee_id=@p0; @p0 = 11 [Type: Int32 (0)] SELECT benefits0_.employee_id AS employee4_1_, benefits0_.id AS id1_, //...other columns...// FROM benefit benefits0_ LEFT OUTER JOIN leave benefits0_1_ ON benefits0_.id=benefits0_1_.id LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ ON benefits0_.id=benefits0_2_.id LEFT OUTER JOIN seasonticketloan benefits0_3_ ON benefits0_.id=benefits0_3_.id WHERE benefits0_.employee_id=@p0; @p0 = 12 [Type: Int32 (0)] SELECT benefits0_.employee_id AS employee4_1_, benefits0_.id AS id1_, //...other columns...// FROM benefit benefits0_ LEFT OUTER JOIN leave benefits0_1_ ON benefits0_.id=benefits0_1_.id LEFT OUTER JOIN skillsenhancementallowance benefits0_2_ ON benefits0_.id=benefits0_2_.id LEFT OUTER JOIN seasonticketloan benefits0_3_ ON benefits0_.id=benefits0_3_.id WHERE benefits0_.employee_id=@p0; @p0 = 13 [Type: Int32 (0)]
There are a total of four SELECT
statements in the preceding SQL. First one is to retrieve employees living in London. We got three instances of such employees having ID 11
, 12
, and 13
. The next three SELECT
statements are to fetch the Benefits
collection for each employee
instance. Do you see the pattern here? We have got one SELECT
that returns N records and then for each of those N records we get an additional SELECT
. So we end up getting 1 + N SELECT
statements. That is select N+1 problem for us. Or read in the reverse order 1 + N SELECTs problem.
Following are the three reasons why select N+1 is not a good thing:
SELECT
statements to database but as N increases, then sending hundreds of SELECT
statements to database could have significant impact on application's behavior.SELECT
statements are bad, what is worse is that each of those SELECT
statements are sent to database in a separate roundtrip. Imagine what would happen to network traffic for a large number of N.SELECT
statements are not the best in class. They are querying on foreign key that is present on the table. If this key is not indexed then you are doing a full table scan to get your results. For a large value of N, doing full table scan hundreds of times in succession is not a good idea.Select N+1 has multiple problems and there are multiple solutions to it. Not every solution addresses every problem of select N+1 but you can choose the right solution depending on severity of issue at hand. For example, you can use the following batching solution to bundle up number of SELECT
statements in a batch and have them sent to database in one roundtrip, thus reducing cost of remote calls. Let's see what each solution offers and in which situation it can be used:
SELECT
statements generated out of lazy loading of collections can be batched together. This solution does not reduce the number of SELECT
statements that are generated but lets you reduce the number of database roundtrips, thus saving the cost of remote calls.As with fetching strategies, there is no right solution to the select N+1 problem. My best bet would be to use lazy loading with batching turned on. If I need a particular collection to be eagerly loaded then I will use future queries. This is what we discuss next.