Data Integration and Knowledge Management in the Process Industry
Digital systems were first widely implemented in the process industries in the 1970`s. Since then there has been a sea of process data and a proliferation of databases and many attempts to integrate them using data warehouses, middleware and web services. Most of these attempts have had limited success and are difficult to maintain and scale.
Recently, on the one hand, data integration and knowledge management have become more complex and, on the other hand, technology has progressed tremendously. It is more complex because of the increase in the quantity and variety of data. In addition to having time series process data, transactional data (e.g. maintenance records, production plans, financial data), engineering data and unstructured data in Excel, Word and PowerPoint we now also have video, audio, email, social media etc.. We also have data from the plant, data from the rest of the enterprise, data from supply chain partners and data from outside the enterprise. In order to optimise the enterprise (the plant, the supply chain and all aspects of the business) it is necessary to organise and use all of this data. This is a huge challenge.
On the other hand, advances in four main areas have transformed our capability to collect and manage this data. Firstly elastic cloud computing provides almost infinite computing resources, secondly we can now manage and organise big data sets using semantic web technologies, thirdly we can easily collect more data using IIoT and finally we can use artificial intelligence including machine learning to process all of this data. The combination of these technologies is a game changer for every organisation. Digital transformation is driven by the application of these technologies to automate business processes and re-imagine the organisation and business models.
Many process companies initially focus their digital initiatives by improving plant data integration. This is a good first step since the most value can often be generated by improving the operation of the plants including their supply chains. Given the above technology developments, it is worth carefully considering any investment in traditional client server (on-premise) solutions to do this. It is likely that this will lead to more “technical debt” as such solutions cannot take full advantage of the four technology areas mentioned above. Whatever approach is taken, it is important to have a good vision for the ultimate goal which includes the integration of all data not just plant data and the use of semantic technologies and AI to use the data effectively to automate business processes and support decision making.
Many process companies are now using the cloud to collect, organise and analyse plant data. Some are building solutions directly on the major cloud platforms like AWS and Azure. Others are working with platforms such as Palantir and C3.ai. Others are organising their knowledge using semantic technologies and enterprise knowledge graphs using solutions such as PoolParty. A neat solution focused on organising engineering data is Viewport Operations. There are many options and it is not yet clear which approach and which companies will win this technology race. However these semantic and cloud based approaches seem to offer more hope than the traditional on-premise, data warehouse approaches we have tried in the past. What do you think?