Powerful graph search for traceability in manufacturing

Ryota Yamanaka
Oracle Developers
Published in
10 min readFeb 4, 2023

--

Photo by carlos aranda on Unsplash

In this article, I will explain the content below:

  1. Created mock datasets of BoM and supply chain network (download)
  2. Tried 9-hop traversal queries for trace-forward / trace-back search
  3. Observed 200 ms response time on a graph with 6 million edges

Traceability in manufacturing

Establishing traceability in supply chains is a challenge not only in manufacturing, where quality improvement is essential, but also in many other fields, such as food, pharmaceuticals, and clothing, due to increasing awareness of safety and environmental concerns. In this article, we will focus on the specific data structures, BoM and supply chain networks of industrial products, while the path-finding technique I will introduce here should be useful in many more fields.

To start, here are definitions of some basic terms in manufacturing. First, a BoM (Bill of Materials) holds information on parts to assemble a product. A BoM is usually in a tree structure. Using a BoM, many useful analyses are possible, such as calculating the total weight of a product based on the number and type of materials required for production. In recent years, many products have been equipped with software, and the BoM also has an important role in managing the licenses of the libraries included in these products.

Next, the term supply chain network is often used to refer to the supply network of raw materials and components needed for a product. In a broader context, however, the supply chain includes the process of processing raw materials and assembling parts within a company, as well as the process after products leave the factory — getting stored in warehouses, transported, placed on shelves at retailers, and picked up by consumers.

Traceability refers to tracking each part and product throughout every step of the manufacturing process, from the time raw materials come into the factory to shipping the final product. It’s important because it provides visibility into each step of manufacturing operations and can identify underlying causes of production issues. It allows manufacturers to prevent and fix issues proactively and improves quality and efficiency.

To achieve high-level traceability, it is necessary to enable processing that traces these series of processes, both from start to end and from end to start (= trace-forward and trace-back, respectively). The graph database technology we’ll discuss in this article is attracting attention for this purpose.

Data modeling with graphs

Now, let’s create a dataset for performing tests. Here we will consider a very simple BoM, a tree structure in which a part is always composed of two child parts. This structure is called a binary tree.

I tried to keep the BoM structure as simple as possible while I added lot numbers to this dataset. BoM and supply chain networks have often been featured as examples of graph database use cases. However, lot information is not taken into account in many cases, and such datasets without lot numbers are not practically useful for traceability searches.

The BoM looks like the figure below. This is how to read the diagram: Lot 001 of product M01 contains lot 011 of part P0001, which contains lot 070 of part P0011.

Also, the supply chain network is assumed to be exactly the same structure, which only considers the processes of transporting products from factories to distributors. The diagram shows that some of the products shipped as lot 001 from factory F01 are stored as lot 023 at warehouse I0001, then shipped again and stored as lot 025 at distributor I0011.

Finally, we integrate the two types of graphs above. Here we assume a condition with a one-to-one correspondence between the roots: production lot numbers and inventory lot numbers. That is, in the figure below: lot 001 (= production lot) of product M01 is shipped as lot 001 (= inventory lot) from factory F01. In the following diagram, we can see that lot 070 of part P0011 (at the top) is included in the products in lot 025 of warehouse I0011 (at the bottom).

Now, let’s increase the levels of both BOM and supply chain to 5. The small dataset has 10 product lots, while the following figure show only 2 product lots (lot 001 and 002 for product M01) and connected entities.

Queries to “trace” graphs

Once we have the graph data ready, let’s run some queries.

First, let’s find out where a specific lot of a specific part has ended up. This is called trace-forward. In this query, we are looking for lot 073 of part P1111. The path pattern (->(p)){1,9} indicates 1 to 9 hops. p1 is the starting point of the path, so we specify the condition for this node.

SELECT v1, e, v2
FROM MATCH ALL (p1) (->(p)){1,9} (p2) ONE ROW PER STEP ( v1, e, v2 ) ON TRACE_ALL
WHERE p1.part_id = 'P1111' AND p1.lot = '073'

The next step is to focus on a specific inventory lot in a distributor and reverse the process to determine which lot numbers are contained for all parts. This is called trace-back. The query below specifies lot 084 of distributor I1111. The path pattern is the same as before, but we put conditions on p2 in this case, which is the endpoint of the path.

SELECT v1, e, v2
FROM MATCH ALL (p1) (->(p)){1,9} (p2) ONE ROW PER STEP ( v1, e, v2 ) ON TRACE_ALL
WHERE p2.place_id = 'I1111' AND p2.lot = '084'

Finally, we can also try a pathfinding search specifying both a starting and ending point. This query can find out if lot 049 of part P1111 is included in the products in lot 084 in distributor I1111 and the paths between them.

SELECT v1, e, v2
FROM MATCH ALL (p1) (->(p)){1,9} (p2) ONE ROW PER STEP ( v1, e, v2 ) ON TRACE_ALL
WHERE p1.part_id = 'P1111' AND p1.lot = '049' AND p2.place_id = 'I1111' AND p2.lot = '084'

More complex cases

We confirmed that the expected results were retrieved from the simple graph data as above. Next, we will check if this data model can be used when there is more complexity in the graph.

For example, the following figure shows a condition where lot 066 of part P0021 is used for both lot 076 and 048 of product M01.

In this case, if an issue is found in lot 066 of part P0211, both lot 002 and 010 of product M01 contain the problematic part, and the scope of impact should become more extensive.

So, let’s try the same trace-forward query from lot 066 of part P0211. Then, as expected, we can find broader shipping destinations passing through the two different lot, 076 and 048, of product M01.

When this complexity is introduced, the trees are connected among the different product lots, resulting in a large graph, even with 10 product lots in the small dataset (= 610 total edges).

(Click to zoom)

Scalability against data size

To check the performance of the queries, we need to run them against sufficiently large size graphs so that we will create simple simulated data. The following is a detailed description of the dataset, so please skip reading it if you don’t need it.

If a product consists of two parts, and each part consists of two parts, and so on, repeating until 4 levels, we will have a tree with 30 (= 2 + 4 + 8 + 16) edges. Another tree with 30 edges with the same structure is created on the supply chain side, and the roots (= the assembled products and the factories) are connected one to one. When there are 100,000 product lots (= roots), the total number of edges will be 100,000 * (30 + 30 + 1) = 6,100,000.

The lot number of each part is a 4-digit number and is selected at random in the data generation. In consequence, the trees will be twisted each other to form a large graph. The same structure is used for the supply chain side. Since the duplicated lot numbers are selected multiple times, the total number of nodes is not uniquely determined, but this time it is 5,909,678.

This dataset is still not large enough, but you should be able to adjust the size and complexity of the graph in the same manner.

Here is the result of running a trace-forward query on this graph from a specific lot of a specific part (= lot 0190295 of part P1111). Even when the entire graph size increases, only the traceable scope from the specified part can be retrieved immediately.

If you only want a list of the inventories and lot numbers, it is more convenient to have the results in a table than graph visualization.

We can also measure the execution time of the query as follows, and it was 203 ms in this case. This response time includes the communication between the notebook and server, so the query execution time on the database should be shorter. This should be quick enough even for designing interactive analytics apps. (I confirmed the server execution time was about 100 ms when I tested with an independent Graph Server.)

How to convert tabular data into graphs

As described above, graph-based data management enables tracing queries, which are remarkably performant. On the other hand, in the actual systems for production and distribution, source data (= transactions and logs) are normally available in table form, which is suitable for data management and aggregation. Therefore, here we look at how to generate graphs from tabular data.

First, here is a definition for creating a graph named trace_bom that manages a table of parts from two source tables, trace_bom_node and trace_bom_edge.

CREATE PROPERTY GRAPH trace_bom
VERTEX TABLES (
trace_bom_node
KEY (id)
LABEL part
PROPERTIES (id, part_id, lot)
)
EDGE TABLES (
trace_bom_edge
KEY (id)
SOURCE KEY(child_id) REFERENCES trace_bom_node (id)
DESTINATION KEY(parent_id) REFERENCES trace_bom_node (id)
LABEL part_of
NO PROPERTIES
)

A graph called trace_scn, which manages the supply chain network, can be defined in the same way.

CREATE PROPERTY GRAPH trace_scn
VERTEX TABLES (
trace_scn_node
KEY (id)
LABEL place
PROPERTIES (id, place_id, lot)
)
EDGE TABLES (
trace_scn_edge
KEY (id)
SOURCE KEY(src_id) REFERENCES trace_scn_node (id)
DESTINATION KEY(dst_id) REFERENCES trace_scn_node (id)
LABEL supplied_to
NO PROPERTIES
)

Finally, here is the graph trace_all, which connects the two graphs above with the bridge edges between the root nodes of the tree (products and factories). As you can see, the ability to integrate multiple data sources into a single graph is another advantage of generating graphs from tables.

CREATE PROPERTY GRAPH trace_all
VERTEX TABLES (
trace_bom_node
KEY (id)
LABEL part
PROPERTIES (id, part_id, lot)
, trace_scn_node
KEY (id)
LABEL place
PROPERTIES (id, place_id, lot)
)
EDGE TABLES (
trace_bom_edge
KEY (id)
SOURCE KEY(child_id) REFERENCES trace_bom_node (id)
DESTINATION KEY(parent_id) REFERENCES trace_bom_node (id)
LABEL part_of
NO PROPERTIES
, trace_scn_edge
KEY (id)
SOURCE KEY(src_id) REFERENCES trace_scn_node (id)
DESTINATION KEY(dst_id) REFERENCES trace_scn_node (id)
LABEL supplied_to
NO PROPERTIES
, trace_b2s_edge
KEY (id)
SOURCE KEY(part_id) REFERENCES trace_bom_node (id)
DESTINATION KEY(place_id) REFERENCES trace_scn_node (id)
LABEL produced_at
NO PROPERTIES
)

How to try on your environment

Here is how to build a demo environment for those who would like to replicate the whole flow above, from graph creation to analysis using queries. If you have some experience with Oracle’s Autonomous Database, it is particularly easy to build the demo environment.

  1. Obtain a free Oracle Cloud account
  2. Launch an Autonomous Database
  3. Download the zip file for data, template, and notebook
  4. Load the data (.csv) into tables using Database Actions
  5. Import the template (.json) on Graph Studio
  6. Import the notebook (.dsnb) and run it

For using Graph Studio, I recommend launching a non-always-free Autonomous Database because always-free Autonomous Databases have resource limitations.

Summary

In this demo, we have stored a simulated BoM and supply chain network in a graph and shown that queries with path patterns (= traversing the graph from one point or searching for paths between two points) are fast and effective in achieving traceability.

The data used in this demo is intentionally simplified, and modeling real-world data would be much more complex. There may be unavailable data in the network, and some constraints and logic may have to be considered. Data collection and cleanup may also be a challenge.

On the other hand, because of the new table-to-graph mapping methods (= CREATE PROPERTY GRAPH) introduced in the past few years, it is now much easier to analyze and visualize transactional data as graphs. I hope this database technology is a breakthrough to address the need for traceability solutions in industry and society.

Please learn more about Oracle Graph from:

--

--