Putting aside any opinions on performance, I’ve been trying to test a notion about whether a couple queries would output the same data (ordering doesn’t matter).

SELECT *
FROM articles
WHERE (
  last_updated >= %s
  OR id IN (1, 2, 3)
  )
  AND created_at IS NOT NULL
SELECT *
FROM articles
WHERE last_updated >= %s
  AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
  AND created_at IS NOT NULL

I think they’re equivalent, but I can’t prove it to myself.

Edit: Aye, looking at the replies, I’m becoming aware that I left out a couple key assumptions I’ve made. Assuming:

a) id is a PRIMARY KEY (or otherwise UNIQUE)

b) I mean equivalent insofar as “the rows returned will contain equivalent data same (though maybe ordered differently)”

  • RobertTableaux@programming.dev
    link
    fedilink
    English
    arrow-up
    9
    ·
    26 days ago

    To answer the question asked, yes, these will “out the same data”. Anyone telling you they’re not the same isn’t looking at what you care about being the same - the rows output (regardless of order). From that perspective, and disregarding performance (as you also mentioned in your question) these queries will give you the same results.

  • rollin@piefed.social
    link
    fedilink
    English
    arrow-up
    5
    ·
    26 days ago

    I don’t think they’re technically the same because UNION implicitly removes duplicates.

    In the case of your specific data, the queries are probably functionality the same as you probably wouldn’t have duplicates in the first query because each row most likely has a unique ID column. Even if it didn’t, last-updated and created-at are probably timestamps which would in practice make them unique, not to mention other fields such as headline and article body - unless there had been a glitch causing a row to be inserted twice.

    If you were to use UNION ALL in place of UNION, duplicates would no longer be removed from the second query. In that case, even if you had duplicate rows in the first query, the second query would return the same rows unless any rows with ID 1, 2 or 3 also had been updated in the given timespan (as those will now be duplicated by the second query)

    Pretty sure that’s how UNION works, so in practice, I think you’d get the same rows 99.9% of the time.

      • rollin@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        26 days ago

        Ah but that’s true ONLY IF the table doesn’t itself contain duplicates. Quick example:

        CREATE TEMPORARY TABLE animal (species VARCHAR(255) NOT NULL, colour VARCHAR(255) NOT NULL);
        INSERT INTO animal VALUES ('monkey', 'green'), ('rabbit', 'orange'), ('elephant', 'pink'),('monkey','blue'),('rabbit','orange'),('monkey','green'),('monkey','green');
        
        SELECT * FROM animal WHERE species = 'monkey' OR colour = 'green';
        +---------+--------+
        | species | colour |
        +---------+--------+
        | monkey  | green  |
        | monkey  | blue   |
        | monkey  | green  |
        | monkey  | green  |
        +---------+--------+
        SELECT * FROM animal WHERE species = 'monkey' UNION SELECT * FROM animal WHERE colour = 'green';
        +---------+--------+
        | species | colour |
        +---------+--------+
        | monkey  | green  |
        | monkey  | blue   |
        +---------+--------+
        

        So we could change the query to use UNION ALL, which does include duplicates. In that case, the returned rows are the same ONLY IF the rows returned by the left side of the UNION do not overlap those returned by the right side, otherwise it will return more rows.

        SELECT * FROM animal WHERE species = 'monkey' UNION ALL SELECT * FROM animal WHERE colour = 'green';
        +---------+--------+
        | species | colour |
        +---------+--------+
        | monkey  | green  |
        | monkey  | blue   |
        | monkey  | green  |
        | monkey  | green  |
        | monkey  | green  |
        | monkey  | green  |
        | monkey  | green  |
        +---------+--------+
        

        For completeness, here’s an example where the two queries in the UNION do not return any of the same rows:

        SELECT * FROM animal WHERE species = 'monkey' OR colour = 'orange';
        +---------+--------+
        | species | colour |
        +---------+--------+
        | monkey  | green  |
        | rabbit  | orange |
        | monkey  | blue   |
        | rabbit  | orange |
        | monkey  | green  |
        | monkey  | green  |
        +---------+--------+
        SELECT * FROM animal WHERE species = 'monkey' UNION ALL SELECT * FROM animal WHERE colour = 'orange';
        +---------+--------+
        | species | colour |
        +---------+--------+
        | monkey  | green  |
        | monkey  | blue   |
        | monkey  | green  |
        | monkey  | green  |
        | rabbit  | orange |
        | rabbit  | orange |
        +---------+--------+
        
    • roadrunner_ex@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      26 days ago

      Aye, looking at the replies, I’m becoming aware that I left out a couple key assumptions I’ve made. Assuming:

      a) id is a PRIMARY KEY (or otherwise UNIQUE)

      b) I mean equivalent insofar as “the rows returned will contain equivalent data same (though maybe ordered differently)”

  • boatswain@infosec.pub
    link
    fedilink
    arrow-up
    4
    ·
    26 days ago

    UNION is used to append the result of one query to the result of another: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver17

    A UNION concatenates result sets from two queries. But a UNION does not create individual rows from columns gathered from two tables. A JOIN compares columns from two tables, to create result rows composed of columns from two tables.

    Your two queries are not equivalent.

    • Deebster@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      26 days ago

      There aren’t any joins in either query (and only one table involved), so that quoted bit of documentation isn’t relevant.

      Order aside, the two results would be identical In the same way that 2(4 + 3) = 2×4 + 2×3

      • boatswain@infosec.pub
        link
        fedilink
        arrow-up
        1
        ·
        26 days ago

        I really didn’t think that’s correct–though it’s been a few years since I did SQL regularly.

        SELECT *
        FROM articles
        WHERE last_updated >= %s
          AND created_at IS NOT NULL
        UNION
        SELECT *
        FROM articles
        WHERE id IN (1, 2, 3)
          AND created_at IS NOT NULL
        

        That should give a list of all articles updated after whatever date (regardless of ID), appended to a list of all articles where the ID is 1, 2, or 3 (regardless of when they were last updated). I would expect to see extra articles that only fit one criteria or the other, and also duplicate articles.

        I included the join quote because an inner join would be the way to do this, rather than a union–though it would likely be less efficient than just filtering on the required parameters.

        If I’m wrong here, I’d love an explanation of why.

        • Deebster@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          26 days ago

          It’s equivalent because UNION removes duplicates; the behaviour you’re describing happens with UNION ALL. Since both queries are article.*, both halves will have the same columns and the dedupe will be successful.

          UNION is less efficient because of this deduplication, but it’s the default since that’s what most people want. If that matters then you’d be correct that a JOIN version will be more efficient (possibly depending on indexes present and sql engine).

  • hex123456@sh.itjust.works
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    26 days ago

    You can prove equivalence for yourself using the minus operator. Replace “select 1” and “select 2” with your queries.

    Select * from (
    (Select 1)
    Minus
    (Select 2)
    )
    
    ——
    
    Select * from (
    (Select 2)
    Minus
    (Select 1)
    )
    
  • afk_strats@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    26 days ago

    Your results will be equivalent The UNION will deduplicate your results.

    AND depending on your SQL engine, indexing, optimizer strategy, and data, you might have wildly different performance characteristics between the two queries… If that matters to you

    You can test this out yourself on the code below using SQLFiddle

    --Create table
    CREATE TABLE employees (
        id INT
        name VARCHAR(50),
        department VARCHAR(30),
        hire_date DATE
    );
    
    -- Insert 5 sample rows. NOTE the duplicate.
    INSERT INTO employees (id, name, department, hire_date) VALUES
    (1,'Alice Johnson', 'Engineering', '2022-01-15'),
    (2,'Bob Smith', 'Marketing', '2021-03-22"),
    (3,'Carol Davis', 'Sales', '2023-06-10"),
    (3,"Carol Davis', 'Sales', '2023-06-10'),
    (4,'David Wilson', "HR', "2020-11-05'),
    (5,'Eva Brown', 'Finance', '2022-09-18');
    ;
    
    -- Returns 4 Rows
        SELECT
        FROM employees
        WHERE id IN (1,2,3)
    UNION
        SELECT
        FROM mployees
        WHERE hire_date > '2022-01-01.
    ;
    
    -- Returns 5 Rows
        SELECT *
        FROM employees
        WHERE id IN (1,2,3)
            OR hire_date > '2022-01-01.
    ;
    
    -- Cleanup
    DROP TABLE employees
    
    ;
    
    • roadrunner_ex@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      26 days ago

      Aye, looking at the replies, I’m becoming aware that I left out a couple key assumptions I’ve made. Assuming:

      a) id is a PRIMARY KEY (or otherwise UNIQUE)

      b) I mean equivalent insofar as “the rows returned will contain equivalent data same (though maybe ordered differently)”

    • Deebster@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      26 days ago

      That won’t show if results are equivalent, only if the query plans are matching, which they won’t be (at least before the SQL engine’s optimisations).