Fujitsu Laboratories announced the development of machine learning technology that enables highly accurate analysis of graph-structured data that expresses the relationships between people and things.
Fujitsu Laboratories has now developed new technology that allows existing deep learning technology, which has already achieved extremely high accuracy in image and voice recognition, to be applied to graph-structured data. Graph-structured data has a complicated structure and mixes a variety of data, such as different sizes and methods of expression, but by transforming different data to a uniform expression called a “tensor”, used in cutting-edge mathematics, it becomes possible to do highly accurate machine learning on graph-structured data using deep learning technology.
This technology was used for learning the structure and activities of chemical compounds, based on data from the PubChem BioAssay open database of chemical compounds. It was able to learn the relationships between the structures of several hundred thousand chemical compounds, about 100 times that of previous technology, as well as their individual activities. Also, by extracting features that could not be grasped with existing technology, it achieved about 80% accuracy in predicting activity, a 10% increase compared to existing technology.
This technology will be used as part of Human Centric AI Zinrai, Fujitsu Limited’s AI technology.
In recent years, drug discovery and a variety of other fields utilize composition databases, such as for finance and chemical substances. These databases handle IoT log data for communication between things, or account transactions, and continue to generate an enormous amount of data that can be expressed in a graph structure to show the relationships between people and things (Fig.1). Previously, Fujitsu Laboratories had developed technology, known as “LOD” to retrieve and analyze graph-structured data. It is expected that accurately categorizing and analyzing this graph-structured data will lead to the creation of new value and the opening up of new business areas.
Previously, categorization of graph-structured data was done on the basis of whether such data contained partial graphs people had previously focused on. When categorizing large volumes of graph-structured data, however, there were many yet-to-be-expressed features in the partial graphs that had been explored beforehand, so there were limits to achieving accurate categorization.
Deep learning technology can automatically extract characteristic features from data, attracting attention to such areas as image and voice recognition, but due to the complicated structure and the variety of data sizes and expressions mixed in graph-structured data, it was difficult to apply deep learning technology to the problem.