Analyze Bank Transaction Data using Graph (Part 2/2)

Ryota Yamanaka
6 min readJul 14, 2023

--

Photo by Eduardo Soares on Unsplash

In Part 1, we explained that we can use Oracle Database 23c as a graph database to create property graphs from tables and execute graph pattern-matching queries.

In this part, I’ll show you how to utilize the graph defined in Part 2, using Graph Server in addition to Oracle Database. Graph Server consists of a high-performance in-memory graph analysis engine and a database integration component. Graph Server is available in all editions of Oracle Database without an additional license.

Load graphs to Graph Server

First, launch a Python client and connect to Graph Server running on the same host.

from opg4py import graph_server
from pypgx import setloglevel
from getpass import getpass

setloglevel("ROOT", "WARN")

base_url = "http://localhost:7007"
username = "graphuser"
password = getpass("Password: ")

instance = graph_server.get_instance(base_url, username, password)
session = instance.create_session("test")
analyst = session.create_analyst()

print(session)
PgxSession(id: 5eb2cb7d-f00c-40fe-ac1f-97c353492ec7, name: jupyter)

There is a graph in the database called BANK_GRAPH, which we created in Part 1, so we will load this graph and expands it in the memory of the Graph Server.

graph = session.read_graph_by_name("BANK_GRAPH", "pg_sql")
print(graph)
PgxGraph(name: BANK_GRAPH, v: 360, e: 6230, directed: True, memory(Mb): 0)

Let’s try executing a query on this graph. Currently, Graph Server supports PGQL (https://pgql-lang.org/), which is similar yet different from SQL. Before the introduction of graph syntax to SQL, Oracle Database and Graph Server supported PGQL. We plan to support the SQL standard in all environments in the future.

graph.query_pgql("""
SELECT c.first_name, LABEL(e), a.acc_id
FROM MATCH (c:customer)-[e:owns]->(a:account)
WHERE c.cst_id = 10
""").print()
+--------------------------------+
| first_name | LABEL(e) | acc_id |
+--------------------------------+
| Laura | OWNS | 10 |
| Laura | OWNS | 90 |
+--------------------------------+

Now, you can execute PGQL queries on the Graph Server, just as well as the SQL queries on the database.

Run path-finding queries

We can expect some queries to become faster when the graph is deployed on the Graph Server. Path-finding is one example.

The following query uses the TOP K SHORTEST clause to retrieve the shortest K (in this case, 1) paths between two accounts (one with acc_id = 10 and another with accc_id = 50). The condition COUNT(a) = COUNT(DISTINCT a)eliminates paths that go through the same account more than once.

graph.query_pgql("""
SELECT ARRAY_AGG(a.acc_id) AS acc_ids
, ARRAY_AGG(t.txn_id) AS txn_ids
, ARRAY_AGG(t.amount) AS amounts
FROM MATCH TOP 1 SHORTEST (a1) (-[t:transferred_to]->(a))+ (a2)
WHERE a1.acc_id = 10 AND a2.acc_id = 50
AND COUNT(a) = COUNT(DISTINCT a)
""").print()
+--------------------------------------+
| acc_ids | txn_ids | amounts |
+--------------------------------------+
| [59, 50] | [1015, 5916] | [100, 500] |
+--------------------------------------+

You can also include a condition that the amount of all transfers must be greater than 800.

graph2.query_pgql("""
SELECT ARRAY_AGG(a.acc_id) AS acc_ids
, ARRAY_AGG(t.txn_id) AS txn_ids
, ARRAY_AGG(t.amount) AS amounts
FROM MATCH TOP 1 SHORTEST (a1) (-[t:transferred_to]->(a) WHERE t.amount > 800){5,} (a2)
WHERE a1.acc_id = 10 AND a2.acc_id = 50
AND COUNT(a) = COUNT(DISTINCT a)
""").print()
+---------------------------------------------------------------------------------+
| acc_ids | txn_ids | amounts |
+---------------------------------------------------------------------------------+
| [23, 5, 33, 11, 50] | [1031, 2302, 507, 3314, 1121] | [700, 600, 700, 900, 700] |
+---------------------------------------------------------------------------------+

If the start and end nodes of the path are the same node (a1), we can find the circulating paths.

graph2.query_pgql("""
SELECT ARRAY_AGG(a.acc_id) AS acc_ids
, ARRAY_AGG(t.txn_id) AS txn_ids
, ARRAY_AGG(t.amount) AS amounts
FROM MATCH TOP 1 SHORTEST (a1) (-[t:transferred_to]->(a) WHERE t.amount > 800){5,} (a1)
WHERE a1.acc_id = 10
AND COUNT(a) = COUNT(DISTINCT a)
""").print()
+-----------------------------------------------------------------------------------+
| acc_ids | txn_ids | amounts |
+-----------------------------------------------------------------------------------+
| [25, 20, 37, 77, 10] | [1028, 2528, 2005, 3712, 7714] | [900, 900, 900, 900, 900] |
+-----------------------------------------------------------------------------------+

Visualize paths

The built-in Graph Visualization tool can visualize graphs on the Graph Server. Let’s use this to visualize the paths found. Since graphs in Graph Server memory are associated with the user sessions, we will first check the session ID of this Python client session.

print(session)
PgxSession(id: bee563a9-7378-4ec2-a8a0-00dcdd69672e, name: OPGShell)

Use this session ID to log in to Graph Visualization.

The query in the previous section is slightly modified here. ONE ROW PER STEP is for retrieving all intermediate steps of the path. For visualization, all edges e are returned (implicitly, with the connected vertices together) and the target graph is specified with ON keyword.

SELECT e
FROM MATCH TOP 1 SHORTEST (a1) (-[t:transferred_to]->(a) WHERE t.amount > 800){5,} (a1)
ONE ROW PER STEP (src, e, dst)
ON bank_graph
WHERE a1.acc_id = 10
AND COUNT(a) = COUNT(DISTINCT a)

Run graph algorithms

Graph Server has many built-in graph algorithms, so let’s first try the well-known PageRank algorithm. PageRank calculates the importance of nodes in the graph, based on both the number of incoming edges, as well as the importance of their source nodes.

>>> analyst.pagerank(graph)
VertexProperty(name: pagerank, type: double, graph: GRAPH)

This algorithm stores the calculation result as the value of a new vertex property, pagerank, whose type is double.

Let’s also look at degree centrality. This score is simply the total number of edges that are connected. We will also calculate the total number of incoming and outgoing edges separately.

analyst.degree_centrality(graph)
analyst.in_degree_centrality(graph)
analyst.out_degree_centrality(graph)

The results can be easily retrieved with a PGQL query. The ability to operate queries and algorithms from the same interface is an advantage of Graph Server.

graph2.query_pgql("""
SELECT
a.acc_id
, a.pagerank
, a.degree
, a.in_degree
, a.out_degree
FROM MATCH (a)
ORDER BY a.pagerank DESC
LIMIT 10
""").print()
+-----------------------------------------------------------------+
| acc_id | pagerank | degree | in_degree | out_degree |
+-----------------------------------------------------------------+
| 41 | 0.019603664018674367 | 102 | 62 | 40 |
| 33 | 0.015104950170843338 | 108 | 58 | 50 |
| 5 | 0.014983680499273774 | 92 | 42 | 50 |
| 24 | 0.014966074159304933 | 89 | 49 | 40 |
| 36 | 0.014897989873184218 | 71 | 41 | 30 |
| 40 | 0.014860174576443065 | 72 | 32 | 40 |
| 23 | 0.014700205912993914 | 84 | 54 | 30 |
| 73 | 0.0141410094293568 | 77 | 47 | 30 |
| 30 | 0.013816617289907915 | 102 | 52 | 50 |
| 32 | 0.013339752300665857 | 42 | 32 | 10 |
+-----------------------------------------------------------------+

The new properties added by the algorithms can also be seen in Graph Visualization. For example, you can change the size of the vertices depending on the PageRank score.

If you look closely, you can see that some vertices are larger than others.

In this article, we loaded the graph defined on the database into the Graph Server, and then execute pathfinding queries and graph algorithms on the graph deployed in memory.

For Previous Versions

If you are using a version prior to Oracle Database 23c, please note the following points:

  • The SQL 2023 syntax explained in Part 1 is not supported in the previous Oracle Database versions. Instead, by using the Graph Client library, you can create graphs and execute PGQL queries.
  • The Graph Server usage explained in Part 2 is the same, regardless of the backend database. However, when loading a graph created using the Graph Client library, replace pg_sql with pg_view:
graph = session.read_graph_by_name("BANK_GRAPH", "pg_view")

For more details, please refer to the documentation.

Please learn more about Oracle Graph from:

--

--