RMOUG Training Days 2014

training_days

I’ll be presenting on February 6th. Here are the details of my session:

SQL Pattern Matching in Oracle 12c

Room 501, 11:15 – 12:15

Last year I had a great time, so I can’t wait to be back in Denver.

I hope to see you there!

Updated on 02/17/2014: The presentation is available on Slideshare


Recursive Subquery Factoring to the Rescue

Recursive subquery factoring (AKA Recursive Common Table Expressions) has been available in Oracle since version 11g R2 was launched sometime around 2009. I always thought of it as a replacement for the connect by clause and just another way to write hierarchical queries, so I never really took the time to play with it.

During Kscope13, I attended Dominic Delmolino’s session on this SQL feature and he mentioned and showed that recursive subquery factoring can be used for more than hierarchical queries. One use case for recursive subquery factoring is when you have to “generate” a value and then “carry over” that value to the next row.

Not too long ago I had to work on a query with a similar requirement. The basic idea is to have a variable number of records per client with some columns that are evaluated to establish a new column (let’s call it VAL). The caveat is that for some records VAL has to be determined using VAL from the previous record.

First I’m going to create a table and insert some data to play with.


SQL> CREATE TABLE RSF_TEST
 2 (
 3 ID_CLIENT NUMBER NOT NULL,
 4 DT_ELIG DATE NOT NULL,
 5 TP NUMBER NOT NULL,
 6 SRV_GRP NUMBER NOT NULL,
 7 SRV_CODE NUMBER NOT NULL,
 8 DT_ADMIT DATE NOT NULL,
 9 DT_DISCHARGE DATE NOT NULL
 10 );

Table created

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-11-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-11-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-10-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-10-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-09-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-09-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-08-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-08-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-07-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-07-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-06-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-06-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-05-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-05-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-04-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-04-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-03-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-03-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-02-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-01-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-12-2012', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-03-2012', 'dd-mm-yyyy'), 13, 1, 1, to_date('18-01-2012', 'dd-mm-yyyy'), to_date('29-03-2012', 'dd-mm-yyyy'));
1 row inserted

SQL> commit;

Commit complete

SQL>

And now a simplified version of the query. The real query has more predicates and the CASE statements are a little bit more complex to cover different srv_grp, srv_code combinations. I removed as much as I could just to show how VAL is determined for the particular clients that I have in my table.


WITH
clients AS
(
 SELECT
   id_client,
   dt_elig,
   tp,
   srv_grp,
   srv_code,
   dt_admit,
   dt_discharge,
   row_number() OVER (PARTITION BY id_client ORDER BY id_client, dt_elig) rn
 FROM rsf_test
),
results (id_client, dt_elig, tp, srv_grp, srv_code, dt_admit, dt_discharge, rn, val) AS
(
 SELECT
   id_client,
   dt_elig,
   tp,
   srv_grp,
   srv_code,
   dt_admit,
   dt_discharge,
   rn,
   CASE
     WHEN tp <> 14 AND srv_grp = 19 AND srv_code = 12 AND dt_elig BETWEEN dt_admit AND dt_discharge THEN 3
     WHEN tp <> 14 AND srv_grp = 19 AND srv_code = 12 AND dt_elig NOT BETWEEN dt_admit AND dt_discharge THEN 0
     WHEN tp <> 14 AND srv_grp IN (1,10) AND srv_code IN (1,3) THEN 0
   END val
 FROM clients
 WHERE
   rn = 1
 UNION ALL
 SELECT
   c.id_client,
   c.dt_elig,
   c.tp,
   c.srv_grp,
   c.srv_code,
   c.dt_admit,
   c.dt_discharge,
   c.rn,
   CASE
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val = 3 AND c.dt_elig BETWEEN c.dt_admit AND c.dt_discharge THEN 3
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val < 3 AND c.dt_elig BETWEEN c.dt_admit AND c.dt_discharge THEN p.val + 1
     WHEN c.tp <> 14 AND c.srv_grp IN (1,10) AND c.srv_code IN (1,3) AND p.val = 3 THEN 3
     WHEN c.tp <> 14 AND c.srv_grp IN (1,10) AND c.srv_code IN (1,3) AND p.val < 3 THEN p.val
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val = 3 AND c.dt_elig NOT BETWEEN c.dt_admit AND c.dt_discharge THEN 3
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val < 3 AND c.dt_elig NOT BETWEEN c.dt_admit AND c.dt_discharge THEN p.val
   END val
 FROM
   clients c,
   results p
 WHERE
   c.id_client = p.id_client
   AND c.rn = p.rn + 1
)
SELECT
  r.id_client,
  r.dt_elig,
  r.tp,
  r.srv_grp,
  r.srv_code,
  r.dt_admit,
  r.dt_discharge,
  --r.rn,
  r.val
FROM results r
ORDER BY
  r.id_client,
  r.dt_elig;

As you can see, the recursive subquery factoring clause contains two query blocks: The anchor query that generates the first row which is then processed by the recursive query. The anchor query and the recursive query are combined using the UNION ALL operator.

Here is the query’s output:


 ID_CLIENT DT_ELIG           TP    SRV_GRP   SRV_CODE DT_ADMIT  DT_DISCHA        VAL
---------- --------- ---------- ---------- ---------- --------- --------- ----------
 525459313 01-MAR-12         13          1          1 18-JAN-12 29-MAR-12          0
 525459313 01-DEC-12         13         19         12 12-DEC-12 30-APR-13          0
 525459313 01-JAN-13         13         19         12 12-DEC-12 30-APR-13          1
 525459313 01-FEB-13         13         19         12 12-DEC-12 30-APR-13          2
 525459313 01-MAR-13         13         19         12 12-DEC-12 30-APR-13          3
 525459313 01-APR-13         13         19         12 12-DEC-12 30-APR-13          3
 525459313 01-MAY-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-JUN-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-JUL-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-AUG-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-SEP-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-OCT-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-NOV-13         13         19         12 01-MAY-13 31-DEC-13          3
 615086545 01-MAR-13         13         19         12 07-MAR-13 31-MAR-14          0
 615086545 01-APR-13         13         19         12 07-MAR-13 31-MAR-14          1
 615086545 01-MAY-13         13         19         12 07-MAR-13 31-MAR-14          2
 615086545 01-JUN-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-JUL-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-AUG-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-SEP-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-OCT-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-NOV-13         13         19         12 07-MAR-13 31-MAR-14          3

22 rows selected.

In this particular example, the anchor query assigns either a 0 or a 3 to VAL for the first row and after that the recursive query evaluates if VAL should increase or stay the same for every other row. So basically the point is to have access to VAL from a previous execution of the recursive query to determine the new value in the current execution.


An Attempt to Validate Email Addresses with a Regular Expression

After my Kscope13 presentation, one of the attendees approached me and asked if I had used regular expressions to try to validate email addresses. He didn’t know about regular expressions before the presentation and he was writing a PL/SQL function (which was getting very complex…) to accomplish this particular task. I offered to give it a try and blog about it.

After some testing, this is the regular expression that I came up with: ^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$

So basically, I consider an email address to be valid (that’s my definition for this example) if it begins with a sequence of characters that may include any letters, any digits, the dot, the underscore or the dash followed by the @ and then followed by a sequence of characters that may include any letters, any digits, the dot or the dash followed by a dot and ending with a sequence of characters that contains a minimum of 2 and a maximum of 3 letters.

I know there’re more valid characters that can be used to define an email address but I chose the ones that are used most of the time. Another thing to keep in mind is that this regular expression is case sensitive, so in order to match email addresses that contain capital letters I’ll rely on the parameter that can be passed to the regexp SQL functions to make the match case insensitive.

So lets take a look at some sample data stored in the clients table:

SQL> select * from clients;

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
FOUNTAIN,VICKIE L                              Vickie.L.Fountain@mail.com
ORTIZ,GENOVEVA M                               123ortiz.gmail.com
MALDONADO,TERESA                               teresa-maldonado@amazon.store
BARAJAS-RAMIREZ,AMY
SERRANO,GRACE
BALDA GALO                (512) 254-1694       galo@email.mydomain.net

6 rows selected.

As you can see, there’re email addresses that match the pattern (1 and 6) and other that don’t match the pattern (2,3).

Lets find the records that match the pattern first:

SQL> select *
  2 from clients
  3 where regexp_like (email,'^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$','i');

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
FOUNTAIN,VICKIE L                              Vickie.L.Fountain@mail.com
BALDA GALO               (512) 254-1694        galo@email.mydomain.net

2 rows selected.

Note how the first email matches the pattern even thought I’m not defining a range for capital letters. This is happening because I used the ‘i’ parameter that tells the function to perform a case insensitive search.

Now lets find the records that don’t match the pattern:

SQL> select *
  2 from clients
  3 where not regexp_like (email,'^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$','i');

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
ORTIZ,GENOVEVA M                               123ortiz.gmail.com
MALDONADO,TERESA                               teresa-maldonado@amazon.store

2 rows selected.

It’s obvious that the first email doesn’t match the pattern because the @ is missing. The second email is incorrect because I specified that the top level domain is only valid if it’s a sequence of 2 or 3 letters at the most and store has four.

So, this is an example showing very basic validations on email addresses. There’s room for improvement but it will always depend on your use case and how complex you want to make it.

I hope you find it useful.


Something I didn’t know: The REVERSE function

I was playing the pl/sql challenge the other day and the question was to identify valid implementations for a “reverse_string” function taking a varchar2 as the input and returning another varchar2 with the characters in inverted order.

One of the possible answers was this:

FUNCTION reverse_string (in_string VARCHAR2)
RETURN VARCHAR2
AS
  l_return   VARCHAR2 (100);
BEGIN
  SELECT REVERSE (in_string) INTO l_return FROM DUAL;

  RETURN l_return;
END reverse_string;

My immediate reaction was to skip this choice because I have never seen REVERSE as a string manipulation function in Oracle; but then I thought…what if this is a tricky question.

So I ran this small query expecting a syntax error but guess what?

SELECT reverse('hello world')
FROM dual;

It worked! The function displayed dlrow olleh.

So REVERSE turned out to be an undocumented function that has been out there for a while. I did find blog posts from 2006 that talk about it. I guess I have been living under a rock all this time or maybe I never had the need for such functionality.

By the way, I’m not suggesting that you should go and write production code around this function, because it’s unsupported and may not even work properly with multi-byte character sets.


Follow

Get every new post delivered to your Inbox.