Learning from Real-World Graphs

What can real-world graphs tell us? This is an area that Assistant Professor Bryan Hooi from the National University of Singapore is delving into as he researches methods for graph-based learning.

Graphs are a way of representing objects and the relationships between them, and they are ubiquitous in many areas – from representing relationships and interactions in social and information networks like Facebook and Twitter, to representing molecules for biomolecular applications.

Prof Hooi’s interest in graph-based learning was sparked when, as an undergraduate, he worked on a thesis project to infer the contact network of a disease (i.e. who passed the disease to whom) using virus genetic sequence information obtained from patients who had caught the disease.

However, he soon found that methods for handling graph data were quite limited. “Most were focused on modelling graphs as mathematical objects,” he said. “There was much less work done on learning from graphs involving real-world data, for example, the detection time for each patient and the genetic sequence data obtained from him/her.”

A major obstacle to applying graph-based learning in more diverse applications is the problem of “label scarcity”, which means that the learning algorithms require large amounts of data with suitable annotations (or “labels”). For example, to learn an algorithm for predicting the toxicity level of a molecule, we need a large dataset of molecules, along with their toxicity levels. However, these toxicity levels are costly to obtain because they have to be measured through laboratory experiments.

This posed a problem for graph-based learning. Prof Hooi noted that while learning methods were often successful when using data that was well labelled, a lot more can be done to develop effective methods for graph-based learning in label-scarce settings.

Prof Bryan Hooi

“My current focus is on making graph learning methods perform better when labels are scarce or absent,” he said. “This will greatly increase their practical applications in the real world, for example in recommendation engines, and anomaly or fraud detection.”

For example, Prof Hooi is working with a ride-hailing app company on methods for recommending to a user the next place that he or she may be interested in visiting. In this case, graphs are used to represent the relationships between users or locations, to allow for more accurate recommendations.

“Let’s say we know the last few places a user visited, and want to recommend where they should go next. Rather than thinking about the locations separately, we found that it is more effective to model the locations using a graph. For example, two shopping malls in close proximity may be related, as users tend to visit them together,” he explained.

“We found that treating the locations as a graph allows for more accurate recommendations. This can be beneficial for the user in terms of convenience and for the merchants by recommending them to users who have a high chance of visiting them.”

Another application of graph-based learning is for detecting fake ratings in online review platforms such as Amazon. On such sites, having reliable ratings is of key importance. For this setting, Prof Hooi has developed methods to detect sellers who falsely inflate the scores for a product by adding fake ratings or reviews.

In these use cases, an additional challenge is that the graph is not fixed but changes over time, so the graph-based learning method has to adapt to this as well. Other examples of temporal graphs that change over time include graphs of user interactions such as messages and posts in social networks, as well as student activities such as the viewing of videos during a massive open online course.

More generally, Prof Hooi has also found that making small and random changes to the graph can effectively increase the amount of training data. Such graph-based “data augmentation” can help a lot when labels are scarce, he said. Following up on this, he is researching on how to design better graph-based data augmentation methods which will preserve the important structures in a graph.

Through these research efforts, Prof Hooi hopes graph-based learning methods can be applied more widely to deliver a practical impact across more areas in the real world.