Jun 05 2015

Laravel’s Eloquent::has() and Eloquent::whereHas() performance problems

Category: Laravel,PHPFractalizeR @ 12:52 am

I had a task in my project where I needed to fetch all orders from the database, for which some user has posted an offer. The first intention was to use nice and shiny whereHas method on Eloquent builder like this:

Order::whereHas('offers', function($q) {
    $q->where('user_id', 1112485)
 });

 

But something stopped me and I went to investigate. It turned out, that this instruction generates the following SQL statement:

SELECT *
FROM orders
WHERE (SELECT count(*)
       FROM offers
       WHERE offers.order_id = orders.id AND "user_id" = 1112485) >= 1;

 

The plan for this statement is:

Seq Scan on orders (cost=0.00..249278.84 rows=9525 width=912) (actual time=271.528..465.900 rows=2 loops=1)
 Filter: ((SubPlan 1) >= 1)
 Rows Removed by Filter: 28574
 SubPlan 1
 -> Aggregate (cost=8.44..8.45 rows=1 width=0) (actual time=0.009..0.011 rows=1 loops=28576)
 -> Index Only Scan using offers_order_id_user_id_idx on offers (cost=0.42..8.44 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=28576)
 Index Cond: ((order_id = orders.id) AND (user_id = 1112485))
 Heap Fetches: 2
Planning time: 0.133 ms
Execution time: 465.953 ms

Pay attention to query cost and execution time. The same task may be achieved by  the following query:

SELECT *
FROM orders
WHERE id IN (SELECT DISTINCT order_id
             FROM offers
             WHERE user_id = 1112485)

 

And this one has much better plan:

Nested Loop (cost=0.71..6280.39 rows=5 width=912) (actual time=0.043..5.260 rows=2 loops=1)
 -> Unique (cost=0.42..6238.76 rows=5 width=4) (actual time=0.024..5.217 rows=2 loops=1)
 -> Index Only Scan using offers_order_id_user_id_idx on offers (cost=0.42..6238.62 rows=53 width=4) (actual time=0.019..5.201 rows=2 loops=1)
 Index Cond: (user_id = 1112485)
 Heap Fetches: 2
 -> Index Scan using orders_pkey on orders (cost=0.29..8.31 rows=1 width=912) (actual time=0.007..0.009 rows=1 loops=2)
 Index Cond: (id = offers.order_id)
Planning time: 0.191 ms
Execution time: 5.312 ms

Well, about 100x difference, right?

The same goes for simple

Order::has('offers')

 

Generated SQL for it is

SELECT *
FROM "orders"
WHERE (SELECT count(*)
       FROM "offers"
       WHERE "offers"."order_id" = "orders"."id") >= 1

 

With the query plan

Seq Scan on orders (cost=0.00..856965.30 rows=9525 width=912) (actual time=11.044..2987.939 rows=25603 loops=1)
 Filter: ((SubPlan 1) >= 1)
 Rows Removed by Filter: 2973
 SubPlan 1
 -> Aggregate (cost=29.71..29.72 rows=1 width=0) (actual time=0.095..0.097 rows=1 loops=28576)
 -> Index Only Scan using offers_order_id_user_id_idx on offers (cost=0.42..29.68 rows=13 width=0) (actual time=0.019..0.072 rows=9 loops=28576)
 Index Cond: (order_id = orders.id)
 Heap Fetches: 247734
Planning time: 0.260 ms
Execution time: 3039.399 ms

If you rewrite query like this

EXPLAIN ANALYZE
SELECT *
FROM "orders" O
WHERE id IN (SELECT DISTINCT order_id
             FROM offers);

 

You get the following plan:

Hash Join (cost=14061.99..21990.07 rows=20785 width=849) (actual time=940.164..1093.802 rows=28694 loops=1)
 Hash Cond: (o.id = offers.order_id)
 -> Seq Scan on orders o (cost=0.00..7439.19 rows=32119 width=849) (actual time=0.006..51.002 rows=32119 loops=1)
 -> Hash (cost=13802.18..13802.18 rows=20785 width=4) (actual time=939.942..939.942 rows=28694 loops=1)
 Buckets: 4096 Batches: 1 Memory Usage: 1009kB
 -> HashAggregate (cost=13594.33..13802.18 rows=20785 width=4) (actual time=852.582..898.002 rows=28694 loops=1)
 Group Key: offers.order_id
 -> Seq Scan on offers (cost=0.00..12887.66 rows=282666 width=4) (actual time=0.003..422.209 rows=282666 loops=1)
Planning time: 0.364 ms
Execution time: 1133.665 ms

3x improvement. Which is slightly better 🙂

So, the conclusion is always to check what is generated for you by ORM.

Leave a Reply

You must be logged in to post a comment. Login now.