My Speaking Schedule for Oracle Open World 2014

A quick post to let you know about the two presentations that I’ll be doing at Oracle Open World 2014.

Session ID:         UGF4482
Session Title:     “Getting Started with SQL Pattern Matching in Oracle Database 12c
Venue / Room:  Moscone South – 301
Date and Time:  9/28/14, 13:30 – 14:15

Session ID:          CON4493
Session Title:      “Regular Expressions in Oracle Database 101″
Venue / Room:   Moscone South – 303
Date and Time:   10/2/14, 13:15 – 14:00

As usual, you might have to check before the session to make sure the room has not changed.

I hope to see you there.


Recursive Subquery Factoring to the Rescue

Recursive subquery factoring (AKA Recursive Common Table Expressions) has been available in Oracle since version 11g R2 was launched sometime around 2009. I always thought of it as a replacement for the connect by clause and just another way to write hierarchical queries, so I never really took the time to play with it.

During Kscope13, I attended Dominic Delmolino’s session on this SQL feature and he mentioned and showed that recursive subquery factoring can be used for more than hierarchical queries. One use case for recursive subquery factoring is when you have to “generate” a value and then “carry over” that value to the next row.

Not too long ago I had to work on a query with a similar requirement. The basic idea is to have a variable number of records per client with some columns that are evaluated to establish a new column (let’s call it VAL). The caveat is that for some records VAL has to be determined using VAL from the previous record.

First I’m going to create a table and insert some data to play with.


SQL> CREATE TABLE RSF_TEST
 2 (
 3 ID_CLIENT NUMBER NOT NULL,
 4 DT_ELIG DATE NOT NULL,
 5 TP NUMBER NOT NULL,
 6 SRV_GRP NUMBER NOT NULL,
 7 SRV_CODE NUMBER NOT NULL,
 8 DT_ADMIT DATE NOT NULL,
 9 DT_DISCHARGE DATE NOT NULL
 10 );

Table created

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-11-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-11-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-10-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-10-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-09-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-09-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-08-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-08-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-07-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-07-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-06-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-06-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-05-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('01-05-2013', 'dd-mm-yyyy'), to_date('31-12-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-05-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-04-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-04-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (615086545, to_date('01-03-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('07-03-2013', 'dd-mm-yyyy'), to_date('31-03-2014', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-03-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-02-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-01-2013', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-12-2012', 'dd-mm-yyyy'), 13, 19, 12, to_date('12-12-2012', 'dd-mm-yyyy'), to_date('30-04-2013', 'dd-mm-yyyy'));
1 row inserted

SQL> insert into RSF_TEST (ID_CLIENT, DT_ELIG, TP, SRV_GRP, SRV_CODE, DT_ADMIT, DT_DISCHARGE)
 2 values (525459313, to_date('01-03-2012', 'dd-mm-yyyy'), 13, 1, 1, to_date('18-01-2012', 'dd-mm-yyyy'), to_date('29-03-2012', 'dd-mm-yyyy'));
1 row inserted

SQL> commit;

Commit complete

SQL>

And now a simplified version of the query. The real query has more predicates and the CASE statements are a little bit more complex to cover different srv_grp, srv_code combinations. I removed as much as I could just to show how VAL is determined for the particular clients that I have in my table.


WITH
clients AS
(
 SELECT
   id_client,
   dt_elig,
   tp,
   srv_grp,
   srv_code,
   dt_admit,
   dt_discharge,
   row_number() OVER (PARTITION BY id_client ORDER BY id_client, dt_elig) rn
 FROM rsf_test
),
results (id_client, dt_elig, tp, srv_grp, srv_code, dt_admit, dt_discharge, rn, val) AS
(
 SELECT
   id_client,
   dt_elig,
   tp,
   srv_grp,
   srv_code,
   dt_admit,
   dt_discharge,
   rn,
   CASE
     WHEN tp <> 14 AND srv_grp = 19 AND srv_code = 12 AND dt_elig BETWEEN dt_admit AND dt_discharge THEN 3
     WHEN tp <> 14 AND srv_grp = 19 AND srv_code = 12 AND dt_elig NOT BETWEEN dt_admit AND dt_discharge THEN 0
     WHEN tp <> 14 AND srv_grp IN (1,10) AND srv_code IN (1,3) THEN 0
   END val
 FROM clients
 WHERE
   rn = 1
 UNION ALL
 SELECT
   c.id_client,
   c.dt_elig,
   c.tp,
   c.srv_grp,
   c.srv_code,
   c.dt_admit,
   c.dt_discharge,
   c.rn,
   CASE
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val = 3 AND c.dt_elig BETWEEN c.dt_admit AND c.dt_discharge THEN 3
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val < 3 AND c.dt_elig BETWEEN c.dt_admit AND c.dt_discharge THEN p.val + 1
     WHEN c.tp <> 14 AND c.srv_grp IN (1,10) AND c.srv_code IN (1,3) AND p.val = 3 THEN 3
     WHEN c.tp <> 14 AND c.srv_grp IN (1,10) AND c.srv_code IN (1,3) AND p.val < 3 THEN p.val
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val = 3 AND c.dt_elig NOT BETWEEN c.dt_admit AND c.dt_discharge THEN 3
     WHEN c.tp <> 14 AND c.srv_grp = 19 AND c.srv_code = 12 AND p.val < 3 AND c.dt_elig NOT BETWEEN c.dt_admit AND c.dt_discharge THEN p.val
   END val
 FROM
   clients c,
   results p
 WHERE
   c.id_client = p.id_client
   AND c.rn = p.rn + 1
)
SELECT
  r.id_client,
  r.dt_elig,
  r.tp,
  r.srv_grp,
  r.srv_code,
  r.dt_admit,
  r.dt_discharge,
  --r.rn,
  r.val
FROM results r
ORDER BY
  r.id_client,
  r.dt_elig;

As you can see, the recursive subquery factoring clause contains two query blocks: The anchor query that generates the first row which is then processed by the recursive query. The anchor query and the recursive query are combined using the UNION ALL operator.

Here is the query’s output:


 ID_CLIENT DT_ELIG           TP    SRV_GRP   SRV_CODE DT_ADMIT  DT_DISCHA        VAL
---------- --------- ---------- ---------- ---------- --------- --------- ----------
 525459313 01-MAR-12         13          1          1 18-JAN-12 29-MAR-12          0
 525459313 01-DEC-12         13         19         12 12-DEC-12 30-APR-13          0
 525459313 01-JAN-13         13         19         12 12-DEC-12 30-APR-13          1
 525459313 01-FEB-13         13         19         12 12-DEC-12 30-APR-13          2
 525459313 01-MAR-13         13         19         12 12-DEC-12 30-APR-13          3
 525459313 01-APR-13         13         19         12 12-DEC-12 30-APR-13          3
 525459313 01-MAY-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-JUN-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-JUL-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-AUG-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-SEP-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-OCT-13         13         19         12 01-MAY-13 31-DEC-13          3
 525459313 01-NOV-13         13         19         12 01-MAY-13 31-DEC-13          3
 615086545 01-MAR-13         13         19         12 07-MAR-13 31-MAR-14          0
 615086545 01-APR-13         13         19         12 07-MAR-13 31-MAR-14          1
 615086545 01-MAY-13         13         19         12 07-MAR-13 31-MAR-14          2
 615086545 01-JUN-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-JUL-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-AUG-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-SEP-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-OCT-13         13         19         12 07-MAR-13 31-MAR-14          3
 615086545 01-NOV-13         13         19         12 07-MAR-13 31-MAR-14          3

22 rows selected.

In this particular example, the anchor query assigns either a 0 or a 3 to VAL for the first row and after that the recursive query evaluates if VAL should increase or stay the same for every other row. So basically the point is to have access to VAL from a previous execution of the recursive query to determine the new value in the current execution.


An Attempt to Validate Email Addresses with a Regular Expression

After my Kscope13 presentation, one of the attendees approached me and asked if I had used regular expressions to try to validate email addresses. He didn’t know about regular expressions before the presentation and he was writing a PL/SQL function (which was getting very complex…) to accomplish this particular task. I offered to give it a try and blog about it.

After some testing, this is the regular expression that I came up with: ^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$

So basically, I consider an email address to be valid (that’s my definition for this example) if it begins with a sequence of characters that may include any letters, any digits, the dot, the underscore or the dash followed by the @ and then followed by a sequence of characters that may include any letters, any digits, the dot or the dash followed by a dot and ending with a sequence of characters that contains a minimum of 2 and a maximum of 3 letters.

I know there’re more valid characters that can be used to define an email address but I chose the ones that are used most of the time. Another thing to keep in mind is that this regular expression is case sensitive, so in order to match email addresses that contain capital letters I’ll rely on the parameter that can be passed to the regexp SQL functions to make the match case insensitive.

So lets take a look at some sample data stored in the clients table:

SQL> select * from clients;

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
FOUNTAIN,VICKIE L                              Vickie.L.Fountain@mail.com
ORTIZ,GENOVEVA M                               123ortiz.gmail.com
MALDONADO,TERESA                               teresa-maldonado@amazon.store
BARAJAS-RAMIREZ,AMY
SERRANO,GRACE
BALDA GALO                (512) 254-1694       galo@email.mydomain.net

6 rows selected.

As you can see, there’re email addresses that match the pattern (1 and 6) and other that don’t match the pattern (2,3).

Lets find the records that match the pattern first:

SQL> select *
  2 from clients
  3 where regexp_like (email,'^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$','i');

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
FOUNTAIN,VICKIE L                              Vickie.L.Fountain@mail.com
BALDA GALO               (512) 254-1694        galo@email.mydomain.net

2 rows selected.

Note how the first email matches the pattern even thought I’m not defining a range for capital letters. This is happening because I used the ‘i’ parameter that tells the function to perform a case insensitive search.

Now lets find the records that don’t match the pattern:

SQL> select *
  2 from clients
  3 where not regexp_like (email,'^[a-z0-9._-]+@[a-z0-9.-]+\.[a-z]{2,3}$','i');

NAME_CLIENT               PHONE                EMAIL
------------------------- -------------------- ------------------------------
ORTIZ,GENOVEVA M                               123ortiz.gmail.com
MALDONADO,TERESA                               teresa-maldonado@amazon.store

2 rows selected.

It’s obvious that the first email doesn’t match the pattern because the @ is missing. The second email is incorrect because I specified that the top level domain is only valid if it’s a sequence of 2 or 3 letters at the most and store has four.

So, this is an example showing very basic validations on email addresses. There’s room for improvement but it will always depend on your use case and how complex you want to make it.

I hope you find it useful.


Follow

Get every new post delivered to your Inbox.