Home » SQL & PL/SQL » SQL & PL/SQL » sql query (19c)
sql query [message #687540] Wed, 29 March 2023 17:34 Go to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi

I have this data set in table ‘students’

Id term module
123 2023 2100
123 2023 2101
123 2023 1001
999 2022 2400
999 2022 2401
999 2022 2402

I want a query that flags up the modules (Y or N) where the first digit is different to the others in that term. There will only be 1 module which is different. This may occur in some terms but not all. So in this case it would be 1(001) module because the other two modules in 2023 start with a 2.

So the result would be:
Id term module mod_flag
123 2023 2100 N
123 2023 2101 N
123 2023 1001 Y
999 2022 2400 N
999 2022 2401 N
999 2022 2402 N

Re: sql query [message #687542 is a reply to message #687540] Wed, 29 March 2023 19:55 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
SCOTT@orcl_12.1.0.2.0> COLUMN mod_flag FORMAT A8
SCOTT@orcl_12.1.0.2.0> -- test data you provided:
SCOTT@orcl_12.1.0.2.0> WITH
  2    students (id, term, module) AS
  3  	 (SELECT 123, 2023, 2100 FROM DUAL UNION ALL
  4  	  SELECT 123, 2023, 2101 FROM DUAL UNION ALL
  5  	  SELECT 123, 2023, 1001 FROM DUAL UNION ALL
  6  	  SELECT 999, 2022, 2400 FROM DUAL UNION ALL
  7  	  SELECT 999, 2022, 2401 FROM DUAL UNION ALL
  8  	  SELECT 999, 2022, 2402 FROM DUAL)
  9  -- query:
 10  SELECT id, term, module,
 11  	    CASE
 12  	      WHEN COUNT (*) OVER (PARTITION BY id, term, SUBSTR (module, 1, 1)) = 1
 13  	      AND  COUNT (*) OVER (PARTITION BY id, term, SUBSTR (module, 1, 1)) <
 14  		   COUNT (*) OVER (PARTITION BY id, term)
 15  	      THEN 'Y'
 16  	      ELSE 'N'
 17  	    END mod_flag
 18  FROM   students
 19  ORDER BY id, term, mod_flag, module
 20  /

        ID       TERM     MODULE MOD_FLAG
---------- ---------- ---------- --------
       123       2023       2100 N
       123       2023       2101 N
       123       2023       1001 Y
       999       2022       2400 N
       999       2022       2401 N
       999       2022       2402 N

6 rows selected.

[Updated on: Wed, 29 March 2023 19:57]

Report message to a moderator

Re: sql query [message #687543 is a reply to message #687542] Thu, 30 March 2023 06:12 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
Question is can there be only two rows per ID, TERM. If it can, then we can't tell which to flag Y and which to flag N. So all we cans do if flag them arbitrarily - one N other one Y. Barbara's solution will not work in case of two rows per ID. It will mark both Y:

WITH students(id, term, module)
  AS (
--      SELECT 123, 2023, 2100 FROM DUAL UNION ALL
      SELECT 123, 2023, 2101 FROM DUAL UNION ALL
      SELECT 123, 2023, 1001 FROM DUAL UNION ALL
      SELECT 999, 2022, 2400 FROM DUAL UNION ALL
      SELECT 999, 2022, 2401 FROM DUAL UNION ALL
      SELECT 999, 2022, 2402 FROM DUAL
     )
SELECT  id,
        term,
        module,
        CASE
          WHEN COUNT(*) OVER(PARTITION BY id,term,SUBSTR(module,1,1)) = 1
               AND COUNT(*) OVER(PARTITION BY id, term, SUBSTR(module,1,1)) < COUNT(*) OVER(PARTITION BY id,term) THEN 'Y'
          ELSE 'N'
        END mod_flag
  FROM  students
  ORDER BY id,
           term,
           mod_flag,
           module
/

        ID       TERM     MODULE M
---------- ---------- ---------- -
       123       2023       1001 Y
       123       2023       2101 Y
       999       2022       2400 N
       999       2022       2401 N
       999       2022       2402 N

SQL>
So if there can be only two rows per ID, TERM:

WITH STUDENTS(ID, TERM, MODULE)
  AS (
--      SELECT 123, 2023, 2100 FROM DUAL UNION ALL
      SELECT 123, 2023, 2101 FROM DUAL UNION ALL
      SELECT 123, 2023, 1001 FROM DUAL UNION ALL
      SELECT 999, 2022, 2400 FROM DUAL UNION ALL
      SELECT 999, 2022, 2401 FROM DUAL UNION ALL
      SELECT 999, 2022, 2402 FROM DUAL
     )
SELECT  ID,
        TERM,
        MODULE,
        CASE
          WHEN COUNT(*) OVER(PARTITION BY ID,TERM) <= 2
            THEN CASE ROW_NUMBER() OVER(PARTITION BY ID,TERM ORDER BY  SUBSTR(MODULE,1,1))
                   WHEN 1 THEN 'N'
                   ELSE 'Y'
                 END
          WHEN COUNT(*) OVER(PARTITION BY ID,TERM,SUBSTR(MODULE,1,1)) > 1 THEN 'N'
          ELSE 'Y'
        END MOD_FLAG
  FROM  STUDENTS
  ORDER BY ID,
           TERM,
           MOD_FLAG,
           MODULE
/

        ID       TERM     MODULE M
---------- ---------- ---------- -
       123       2023       1001 N
       123       2023       2101 Y
       999       2022       2400 N
       999       2022       2401 N
       999       2022       2402 N

SQL>
And if there can't be only two rows per ID:

WITH STUDENTS(ID, TERM, MODULE)
  AS (
      SELECT 123, 2023, 2100 FROM DUAL UNION ALL
      SELECT 123, 2023, 2101 FROM DUAL UNION ALL
      SELECT 123, 2023, 1001 FROM DUAL UNION ALL
      SELECT 999, 2022, 2400 FROM DUAL UNION ALL
      SELECT 999, 2022, 2401 FROM DUAL UNION ALL
      SELECT 999, 2022, 2402 FROM DUAL
     )
SELECT  ID,
        TERM,
        MODULE,
        CASE
          WHEN COUNT(*) OVER(PARTITION BY ID,TERM,SUBSTR(MODULE,1,1)) > 1 THEN 'N'
          ELSE 'Y'
        END MOD_FLAG
  FROM  STUDENTS
  ORDER BY ID,
           TERM,
           MOD_FLAG,
           MODULE
/

        ID       TERM     MODULE M
---------- ---------- ---------- -
       123       2023       2100 N
       123       2023       2101 N
       123       2023       1001 Y
       999       2022       2400 N
       999       2022       2401 N
       999       2022       2402 N

6 rows selected.

SQL>
SY.

[Updated on: Thu, 30 March 2023 06:16]

Report message to a moderator

Re: sql query [message #687544 is a reply to message #687543] Thu, 30 March 2023 09:56 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Thanks for the responses.

Asked the business, and yes - even though very rare - it is possible to have only one or two modules. In the case of two, the lowest number would be flagged as 'Y'. In the case of one then that defaults to N.
Re: sql query [message #687545 is a reply to message #687544] Thu, 30 March 2023 10:45 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
In fact, it is always the lowest number, regardless of the number of records. e.g.

1 record only:

2001 N
-------------------
2 records:

2001 N
1002 Y
---------

2 +

3001 N
3002 N
3003 N
2001 Y
Re: sql query [message #687546 is a reply to message #687544] Thu, 30 March 2023 10:51 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
OK, so this sounds like it can be reformulated as follows: For a group of rows as classified by (ID, TERM), inspect the first digit of MODULE on each row, and also note the maximum of all these first-digits in the group. If the first digit on a particular row is < than the max, mark the flag as 'Y', otherwise as 'N'.

Then the query is trivial to write:

with
  students (id, term, module) as (
    select 123, 2023, 2100 from dual union all
    select 123, 2023, 2101 from dual union all
    select 123, 2023, 1001 from dual union all
    select 999, 2022, 2400 from dual union all
    select 999, 2022, 2401 from dual union all
    select 999, 2022, 2402 from dual union all
    select 200, 2023, 1000 from dual union all
    select 550, 2021, 1833 from dual union all
    select 550, 2021, 2340 from dual union all
    select 550, 2022, 3230 from dual union all
    select 550, 2022, 3231 from dual
  )
select id, term, module,
       case when substr(module, 1, 1) < max(substr(module, 1, 1)) over (partition by id, term)
            then 'Y' else 'N' end as flag
from   students
order  by id, term, module  -- or whatever is needed
;


        ID       TERM     MODULE FLAG
---------- ---------- ---------- ----
       123       2023       1001 Y
       123       2023       2100 N
       123       2023       2101 N
       200       2023       1000 N
       550       2021       1833 Y
       550       2021       2340 N
       550       2022       3230 N
       550       2022       3231 N
       999       2022       2400 N
       999       2022       2401 N
       999       2022       2402 N

[Updated on: Thu, 30 March 2023 10:51]

Report message to a moderator

Re: sql query [message #687547 is a reply to message #687546] Thu, 30 March 2023 13:25 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
Or ... actually my reformulation is not equivalent to your problem statement, because a group by (ID, TERM) may have three or more rows, and in that case you said that at most one MODULE has a different first digit from the other MODULEs in the same group; you didn't say that in that case (which is the most common), the 'Y' flag will have the first digit lower than the other MODULEs in the group. It will only be different. Correct?

If so, then the query must be modified - perhaps something like this:

with
  students (id, term, module) as (
    select 123, 2023, 2100 from dual union all
    select 123, 2023, 2101 from dual union all
    select 123, 2023, 1001 from dual union all
    select 123, 2024, 1100 from dual union all
    select 123, 2024, 1200 from dual union all
    select 123, 2024, 2304 from dual union all
    select 999, 2022, 2400 from dual union all
    select 999, 2022, 2401 from dual union all
    select 999, 2022, 2402 from dual union all
    select 200, 2023, 1000 from dual union all
    select 550, 2021, 1833 from dual union all
    select 550, 2021, 2340 from dual union all
    select 550, 2022, 3230 from dual union all
    select 550, 2022, 3231 from dual
  )
select id, term, module,
       case when   count(*) over (partition by id, term) >= 3
               and count(*) over (partition by id, term, substr(module, 1, 1)) = 1
            then 'Y'
            when   count(*) over (partition by id, term) = 2
               and substr(module, 1, 1) < max(substr(module, 1, 1)) over (partition by id, term)
            then 'Y'
            else 'N' end as flag
from   students
order  by id, term, module  -- or whatever is needed
;

   ID  TERM MODULE FLAG
----- ----- ------ ----
  123  2023   1001 Y   
  123  2023   2100 N   
  123  2023   2101 N   
  123  2024   1100 N   
  123  2024   1200 N   
  123  2024   2304 Y   
  200  2023   1000 N   
  550  2021   1833 Y   
  550  2021   2340 N   
  550  2022   3230 N   
  550  2022   3231 N   
  999  2022   2400 N   
  999  2022   2401 N   
  999  2022   2402 N
Re: sql query [message #687549 is a reply to message #687547] Fri, 31 March 2023 09:40 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi, the first one was ok - i.e. the lowest digit. However, there are some other anomalies currently discussing with the business..
Re: sql query [message #687550 is a reply to message #687549] Fri, 31 March 2023 10:11 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
Yep - re-reading the thread later, I saw again where you stated that the "exception" will always have the lower digit (so the first answer was OK). When I looked at my answer later, I couldn't remember why I thought it was OK, so I wrote the more general answer. We are on the same page!
Re: sql query [message #687600 is a reply to message #687550] Wed, 12 April 2023 11:54 Go to previous messageGo to next message
ora9a
Messages: 42
Registered: June 2010
Member
Hi again,

So, have this scenario where there is no one distinct first digit, so result is:

ID TERM MODULE FLAG
----- ----- ------ ----
123 2023 3001 N
123 2023 2100 Y
123 2023 2101 Y
123 2023 3100 N

In this type of situation, all should be N.

The original scenario remains the same:

ID TERM MODULE FLAG
----- ----- ------ ----
123 2023 3001 N
123 2023 3100 N
123 2023 2101 Y
123 2023 3200 N



Thanks



Re: sql query [message #687603 is a reply to message #687600] Wed, 12 April 2023 13:18 Go to previous message
mathguy
Messages: 106
Registered: January 2023
Senior Member
You can add a condition, like I was doing in my "other" answer (where I didn't assume the 'Y' row is necessarily the one with the lowest first digit of MODULE value), something like this:

select id, term, module,
       case when substr(module, 1, 1) < max(substr(module, 1, 1)) over (partition by id, term)
             and count(*) over (partition by id, term, substr(module, 1, 1)) = 1
            then 'Y' else 'N' end as flag
from   students
order  by id, term, module  -- or whatever is needed
;
If this becomes more and more complicated (for example, if there may be three different first-digits, etc.), you could write a more compact solution using MATCH_RECOGNIZE. If you aren't familiar with it (it was new in Oracle 12.1), it is absolutely worth the effort to learn it, since it can do a lot of things that simply can't be done any other way in SQL (and it can do a lot of things in a simple way, that can be done in other ways but only in a very complex manner).

select id, term, module, cls
from   students
match_recognize (
  partition by id, term
  order     by module
  measures  classifier() as cls
  all rows per match
  pattern   ( ^ (Y|N) N* )
  define    Y as module < trunc(next(module), -3)
)
;
In the DEFINE clause of MATCH_RECOGNIZE, notice the different way I wrote the condition on the "first digit". This assumes (like all the earlier discussion) that all "modules" are exactly four-digit integers; there are no "modules" like 320 or 114303.

The PATTERN clause says that only the first row in a partition (the minimum value of MODULE) can be classified as Y, while everything else is N (also the first row, if it doesn't satisfy the condition for Y). DEFINE says that the condition for Y (which again, remember, can only be for the first row, having the minimum value of MODULE) is that the MODULE value be strictly less than the next MODULE value, rounded down to the thousand; TRUNC(2843, -3), for example, is 2000. The numerical inequality comparing to that TRUNC is equivalent to the condition on first digits (if all values are four-digit integers).
Previous Topic: Subtrair valores
Next Topic: Format Dollar and Date Fields
Goto Forum:
  


Current Time: Fri Mar 29 10:57:20 CDT 2024