Graph Syntax in SQL 2023

Ryota Yamanaka
7 min readMay 19, 2023

The new developer release, Oracle Database 23c Free, became available on April 3, 2023. This release supports the Property Graph syntax in the latest SQL specification (SQL 2023).

In this article, I will show some example queries based on the data from “Analyzing Bank Transfer Data with Graphs”. In the previous article, I showed queries using PGQL, the query language for property graphs. In this article, I will try the same graph query capability using SQL.

For more details, go to Property Graph Release 23.2 page of Oracle Database 23c Free Documentation and see the SQL Property Graphs section.

Launch Oracle Database 23c

Run the container using Docker or Podman. This container is about 10 GB after decompression.

docker run -p 1521:1521 --name database-23c \
container-registry.oracle.com/database/free:latest

Once you see this message, the database has started successfully.

...
#########################
DATABASE IS READY TO USE!
#########################
...

If you stopped the container using Ctrl + C or similar, start it again.

docker start database-23c

Set the password for the admin user.

docker exec -it database-23c ./setPassword.sh password123

Log in to the database as the admin user (= sys).

docker exec -it database-23c sqlplus sys/password123@freepdb1 as sysdba

Create a user

Create “graphuser” as a regular database user and grant the CONNECT and RESOURCE roles. Refer to this page for more information on new permissions related to graphs.

CREATE USER graphuser
IDENTIFIED BY password123
DEFAULT TABLESPACE users
TEMPORARY TABLESPACE temp
QUOTA UNLIMITED ON users;
GRANT CONNECT, RESOURCE TO graphuser;

Log out from the SQL client.

exit

Create table dataset

To download the data files onto the container, run a shell on the container.

docker exec -it database-23c /bin/bash

Download the files containing INSERT statements for data loading.

curl -O https://raw.githubusercontent.com/ryotayamanaka/oracle-graph/main/bank-transaction/data/scale-100/bank_customer.sql
curl -O https://raw.githubusercontent.com/ryotayamanaka/oracle-graph/main/bank-transaction/data/scale-100/bank_account.sql
curl -O https://raw.githubusercontent.com/ryotayamanaka/oracle-graph/main/bank-transaction/data/scale-100/bank_transaction.sql

Log in to the database as the “graphuser”.

sqlplus graphuser/password123@freepdb1

Create the tables. Here, constraints such as primary keys and foreign keys are not mandatory to create a graph but are useful for maintaining the consistency of the graph.

CREATE TABLE bank_customer (
cst_id NUMBER NOT NULL
, first_name VARCHAR2(255)
, last_name VARCHAR2(255)
, CONSTRAINT bank_customer_pk PRIMARY KEY (cst_id)
);

CREATE TABLE bank_account (
acc_id NUMBER NOT NULL
, cst_id NUMBER NOT NULL
, CONSTRAINT bank_account_pk PRIMARY KEY (acc_id)
, CONSTRAINT bank_account_cst_fk FOREIGN KEY (cst_id) REFERENCES bank_customer (cst_id)
);

CREATE TABLE bank_transaction (
acc_id_src NUMBER
, acc_id_dst NUMBER
, txn_id NUMBER
, datetime TIMESTAMP
, amount NUMBER
, CONSTRAINT bank_transaction_pk PRIMARY KEY (txn_id)
, CONSTRAINT bank_transaction_src_fk FOREIGN KEY (acc_id_src) REFERENCES bank_account (acc_id)
, CONSTRAINT bank_transaction_dst_fk FOREIGN KEY (acc_id_dst) REFERENCES bank_account (acc_id)
);

Execute INSERT statements.

set termout off
@bank_customer.sql
@bank_account.sql
@bank_transaction.sql
commit;
Commit complete.

Confirm that the data has been loaded.

set termout on
select count(*) from bank_customer;
select count(*) from bank_account;
select count(*) from bank_transaction;
  COUNT(*)
----------
160

COUNT(*)
----------
200

COUNT(*)
----------
6030

Create a graph

Create a graph bank_graph. The CREATE PROPERTY GRAPH syntax used here was introduced in SQL 2023. The statement below defines customer and account as nodes (= vertex) and transferred_to and owns as edges.

CREATE PROPERTY GRAPH bank_graph
VERTEX TABLES (
bank_customer
KEY (cst_id)
LABEL customer
PROPERTIES (cst_id, first_name, last_name)
, bank_account
KEY (acc_id)
LABEL account
PROPERTIES (acc_id)
)
EDGE TABLES (
bank_transaction
KEY (txn_id)
SOURCE KEY (acc_id_src) REFERENCES bank_account (acc_id)
DESTINATION KEY (acc_id_dst) REFERENCES bank_account (acc_id)
LABEL transferred_to
PROPERTIES (txn_id, datetime, amount)
, bank_account AS bank_account_owns
KEY (acc_id)
SOURCE KEY (cst_id) REFERENCES bank_customer (cst_id)
DESTINATION KEY (acc_id) REFERENCES bank_account (acc_id)
LABEL owns
);
Property graph created.

Run queries

The commands below are for formatting the query results.

SET LINESIZE 200
SET PAGESIZE 5000
COLUMN first_name FORMAT a10

First, check the account held by the customer with CST_ID = 10.

SELECT * FROM GRAPH_TABLE (bank_graph
MATCH (c IS customer)-[e IS owns]->(a IS account)
WHERE c.cst_id = 10
COLUMNS (c.cst_id, c.first_name, a.acc_id)
);

This customer’s name is Laura, and who has two accounts.

CST_ID     FIRST_NAME     ACC_ID
---------- ---------- ----------
10 Laura 10
10 Laura 90

Next, let’s query for all destinations of Laura’s transfers.

SELECT DISTINCT * FROM GRAPH_TABLE (bank_graph
MATCH (c IS customer)-[e IS owns]->(a IS account)-[t IS transferred_to]->(a1 IS account)
WHERE c.cst_id = 10
COLUMNS (c.cst_id, c.first_name, a.acc_id, a1.acc_id AS acc_id_1)
);
CST_ID     FIRST_NAME     ACC_ID   ACC_ID_1
---------- ---------- ---------- ----------
10 Laura 10 97
10 Laura 10 49
10 Laura 10 59
10 Laura 10 75
10 Laura 10 92
...

Now, focusing only on the transfers between accounts, let’s look for a pattern starting from the account with acc_id = 10 and returning to the same account in 3 hops.

SELECT DISTINCT * FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t1 IS transferred_to]->(a2)-[t2 IS transferred_to]->(a3)
, (a3)-[t3 IS transferred_to]->(a1)
WHERE a1.acc_id = 10
AND t1.amount > 500 AND t2.amount > 500 AND t3.amount > 500
AND t1.datetime < t2.datetime AND t2.datetime < t3.datetime
COLUMNS (
a1.acc_id AS acc_id_1, t1.amount AS amount_1
, a2.acc_id AS acc_id_2, t2.amount AS amount_2
, a3.acc_id AS acc_id_3, t3.amount AS amount_3
)
);

The pattern was found where acc_id is 10 > 75 > 77 > 10.

ACC_ID_1   AMOUNT_1   ACC_ID_2   AMOUNT_2   ACC_ID_3   AMOUNT_3
---------- ---------- ---------- ---------- ---------- ----------
10 600 75 900 77 900
10 800 75 900 77 900

Similarly, try this for 4 hops.

SELECT DISTINCT * FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t1 IS transferred_to]->(a2)-[t2 IS transferred_to]->(a3)
, (a3)-[t3 IS transferred_to]->(a4)-[t4 IS transferred_to]->(a1)
WHERE a1.acc_id = 10
AND a1.acc_id != a3.acc_id AND a2.acc_id != a4.acc_id
AND t1.amount > 500 AND t2.amount > 500
AND t3.amount > 500 AND t4.amount > 500
AND t1.datetime < t2.datetime
AND t2.datetime < t3.datetime
AND t3.datetime < t4.datetime
COLUMNS (
a1.acc_id AS acc_id_1, t1.amount AS amount_1
, a2.acc_id AS acc_id_2, t2.amount AS amount_2
, a3.acc_id AS acc_id_3, t3.amount AS amount_3
, a4.acc_id AS acc_id_4, t4.amount AS amount_4
)
);

Only one path was found in this case.

ACC_ID_1   AMOUNT_1   ACC_ID_2   AMOUNT_2   ACC_ID_3   AMOUNT_3   ACC_ID_4   AMOUNT_4
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10 800 49 600 81 700 25 700

Let’s look for accounts that received a large number of small remittances (less than 500) during the period between Oct 1 and Dec 1. Relational database is also good at aggregation operations.

SELECT t.acc_id, COUNT(acc_id) AS num_of_txn
FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t IS transferred_to]->(a2)
WHERE t.datetime >= TIMESTAMP '2020-10-01 00:00:00'
AND t.datetime < TIMESTAMP '2020-12-01 00:00:00'
AND t.amount < 500.00
COLUMNS (
a2.acc_id
)
) t
GROUP BY t.acc_id
ORDER BY num_of_txn DESC
FETCH FIRST 10 ROWS ONLY
;
ACC_ID     NUM_OF_TXN
---------- ----------
23 29
41 27
33 26
30 24
141 23
159 22
134 22
123 22
130 21
34 20

Check execution plans

Like other SQL queries, EXPLAIN PLAN can obtain the execution plan.

SET LINESIZE 200
SET PAGESIZE 5000
COLUMN plan_table_output FORMAT a120
EXPLAIN PLAN FOR 
SELECT DISTINCT * FROM GRAPH_TABLE (bank_graph
MATCH (a1)-[t1 IS transferred_to]->(a2)-[t2 IS transferred_to]->(a3)
, (a3)-[t3 IS transferred_to]->(a1)
WHERE a1.acc_id = 10
AND t1.amount > 500 AND t2.amount > 500 AND t3.amount > 500
AND t1.datetime < t2.datetime AND t2.datetime < t3.datetime
COLUMNS (
a1.acc_id AS acc_id_1, t1.amount AS amount_1
, a2.acc_id AS acc_id_2, t2.amount AS amount_2
, a3.acc_id AS acc_id_3, t3.amount AS amount_3
)
);
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(format=>'ALL'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 540988490

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 156 | 28 (4)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 156 | 28 (4)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 156 | 27 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 7 | 728 | 18 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| BANK_TRANSACTION | 9 | 468 | 9 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL| BANK_TRANSACTION | 2943 | 149K| 9 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | BANK_TRANSACTION | 19 | 988 | 9 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
...

Since we can apply standard SQL tuning techniques, let’s create indexes that may improve table join operations.

CREATE INDEX bank_account_cst_idx ON bank_account (cst_id);
CREATE INDEX bank_transaction_src_idx ON bank_transaction (acc_id_src);
CREATE INDEX bank_transaction_dst_idx ON bank_transaction (acc_id_dst);

From the execution plan, we can confirm that the indexes are used.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4108979867

-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 156 | 19 (6)| 00:00:01 |
| 1 | HASH UNIQUE | | 1 | 156 | 19 (6)| 00:00:01 |
|* 2 | HASH JOIN | | 1 | 156 | 18 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 7 | 728 | 16 (0)| 00:00:01 |
|* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| BANK_TRANSACTION | 9 | 468 | 7 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | BANK_TRANSACTION_DST_IDX | 18 | | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | BANK_TRANSACTION | 2943 | 149K| 9 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID BATCHED | BANK_TRANSACTION | 19 | 988 | 2 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | BANK_TRANSACTION_SRC_IDX | 50 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------

Such graph pattern matching can also be executed on PGX, an in-memory graph database implementation, possibly for significant performance gains (currently using PGQL). The SQL Property Graphs section of the documentation describes how to load the graphs into the PGX.

We intend to utilize this functionality for various use cases in the future. If you are interested, please follow the OracleGraph article.

Please learn more about Oracle Graph from:

--

--