SQL Server 2017: Graph Database


Starting SQL Server 2017 graph database capabilities are available to model many to many relationships. Graph relationships are integrated into T-SQL and receive all benefits of using SQL Server as the foundation DB management system.

Graphs are connected nodes(vertex) by edges (relationships). So any network related, routing, finding a relation, path, etc related real life applications use graphs. For example:

  • Connecting with friends on social media, where each user is a vertex, and when users connect they create an edge
  • Using GPS/Google Maps/Yahoo Maps, to find a route based on the shortest route
  • Google, to search for web pages, where pages on the internet are linked to each other by hyperlinks; each page is a vertex and the link between two pages is an edge
  • On eCommerce websites, relationship graphs are used to show recommendations
  • Make My Trip and some other flight booking portals uses Graph Theory to find out shortest and cheapest path

In earlier versions of SQL Server to model many-to-many relationships, a common approach is to introduce a table that holds such relationships. 

For example, Student and Course in a school share a many-to-many relationship; a Student takes multiple Courses and a Course is taken by multiple Students. To represent this kind of relationship one can create an “Attends” table to hold information about all the Courses a Student is taking. The “Attends” table can then store some extra information like the dates when a given Student took this Course, etc.



Over time applications tend to evolve and get more complex. For example, a Student can start “Volunteering” in a Course or start mentoring “Mentoring” others. This will add new types of relationships to the database. With this type of approach, it is not always easy to modify existing tables to accommodate evolving relationships. To analyze data connected by means of foreign keys or multiple junction tables involves writing complex queries with joins across multiple tables, and this is no trivial task. The queries can quickly get complex, resulting in complex execution plans and degraded query performance over time.

In current times we live in an era of big data and connected information; people, machines, devices, businesses across the continents are connected to each other more than ever before. Analyzing connected information is becoming critical for businesses to achieve operational agility. Users are finding it easier to model data and complex relationships with the help of graph databases. So it is no surprise that SQL Server has added graph extension to give customers best of both worlds in a single product, along with the benefit of other highly advanced technologies such as column-store indexes, high availability and advanced analytics using SQL Server R services.

There is nothing new a graph database can achieve, which cannot be achieved using a relational database. However, a graph database can make it easier to express certain kind of queries. Also, with specific optimizations, certain queries may perform better. Several factors which may influence the use of one over other can be as below:
  • When an application is having hierarchical data. Then we can use HierarchyID datatype to implement hierarchies, but it has some limitations. For example, it does not allow you to store multiple parents for a node.
  • When an application has complex many-to-many relationships; as the application evolves, new relationships are added.
  • You need to analyze interconnected data and relationships.
In my future post I will discuss this in more details.

Comments