Skip to content
Facebook
Twitter
LinkedIn
WhatsApp
Email
Print
Further Reading
November 1, 2017

11.12 External Data: Reaching Beyond

A somewhat astonishing, almost genetic element of digital enterprises is their independence from physically limiting markets. 

The digital market allows contact through the Internet to many new clients reachable virtually effortlessly and anywhere on the planet. But markets and the behaviour of clients may significantly differ in various regions or countries. Like any enterprise, the digital ones need to know their clients and client segments intimately for effective communication and successful sales. Acquisition of information outside existing business processes becomes increasingly important to allow for analysis of aspects around already-serviced core segments with the aim, depending on business strategy, to push current boundaries and limitations beyond existing markets.

Traditionally, and in the absence of alternatives, this was always done by extracting relevant information from books, studies, chambers of commerce and the like, only from high-level perspectives and often with poor business value.

The digital enterprise, though, is dependent on detailed information and hence much more external information and data need to be considered.

‘External’ means information or data that have not been produced by the business processes of an enterprise itself, but which are available publicly, e.g., through partnering or commercially.

Typical external sources are:

  • Databases maintained by organisations such as commercial registers, commercial address brokers, organisations such as the World Bank, Bloomberg and the like
  • Publicly available text sources such as magazines, journals or information services such as EBSCO
  • Contracted cloud services such as Adobe Marketing Cloud, Google Analytics and the like
  • Data stemming from social media such as Twitter, Facebook, LinkedIn and similar.

But what are useful sources and how can the acquired information be turned into machine-readable and in particular business-relevant data? In order to get most value out of data from external, partly unknown sources, follow a proven three-step approach to information intelligence:

  1. Source Intelligence: As a new discipline, successful data acquisition starts with data scouting to identify information sources that are relevant to the markets and products of a digital enterprise. It tests reliability of selected sources in regard to different aspects, such as legality, trustworthiness, correctness, completeness, stability and accessibility, prior to first use and on an ongoing basis once included into sourcing activities. Data scouting also identifies data sources that are no longer needed and initiates their offboarding. The latter activity circumscribes an essential aspect of using external data: only include those that are really needed for your business processes – never source purpose-free data. Acquisition of non-operating data introduces new technical concepts, which should not be confused with those commonly used for processing of internal data sources. The majority of external sources have not been created by the data owner for the purpose of importing them into other systems. Instead, information changes happen based on events without necessarily showing any trace of a change. Any data acquisition process or application making use of external information needs to deal with temporary or infinite unavailability, and must be aware of its origin, trustworthiness, up-todateness and many other important characteristics – the storage of which is often indispensable at attribute level or at least at record or object level. As many relevant sources are unstructured, or at best semi-structured, natural language processing (NLP) is a new data-processing discipline used to extract structured data from unstructured sources.

  2. Entity Intelligence: The most important step, however, is the transformation of gathered external data into meaning. In trying to do so, you will obviously face many low-level challenges, for example in the area of language and semantical differences or all kinds of typos and syntactical traps. Beyond this and in by far the majority of instances, connecting the dots is an intelligence business, which needs long-time observation of desired entities at their individual level. Identifying keys are rarely available that would allow the simple linkage of information from independent sources.That means creating a relationship of attributes from different sources will only be possible considering probabilistic approaches, i.e., an entity is an object with a computed probability, which needs to be stored as meta-information together with that particular object or parts of it. Over time, the sourcing mechanisms will deliver more information, adding to, verifying or falsifying existing information or aspects of an entity. So, only within an ongoing process can trustworthiness and reliability of entities be increased to an acceptable level. This concept – that a derived entity might be correct or even real at only a certain likelihood – must be considered for any subsequent usage.

  3. Context Intelligence: The context of internal information can usually be derived from the systems that provide it. For external data, however, a context has to be specifically defined. Additionally, relevant entities demand contextualisation for each instance of use. Good practice suggests the definition of an ontology offering a semantical abstraction layer to the underlying data. That includes the opportunity to reduce entities and their describing attributes to those that are relevant for a selected business ecosystem. Entities can be used in multiple ecosystems with different perspectives and it is necessary to maintain oversight of the provided ontologies, for example in a specialised data dictionary or ontology management system. Often desired or strived for, there is no guarantee that external entities can be linked to internal ones and even probabilistic approaches might not lead to success. Usually, this linkage is not important anyhow for prospecting client business. Data protection laws play an important role and the need-to-know principle strictly governs any authorisation to use data. This does not cover authorisation to access data on clients of the digital enterprise. Using external data – and in particular combining it with internal data – always raises legal and governance-related questions, which are sensitive and, unfortunately, are specific to different markets. It is good practice to include legal functions within any data-driven initiative before investing large amounts.

In conclusion, the effort to access external data is enormous, hence data-as-a-service (for example in the cloud) should be considered so as to benefit from sharing effort, cost and scarce expertise. The value proposition for growing business is often convincing, however, which can easily justify necessary investments. As pointed out, working with external data primarily demands new data management disciplines and increased data maturity of a digital enterprise.

Actions and investment into often speciously demanded technical platforms are typically vendor driven and will not per se make the difference.

Add a document to this circle
Document Source *
Maximum file size: 50 MB
Please ensure that visibility permissions for the document are set to Visible to Everyone with a Link. Only Circle Members will have access to the link.
Describe the document in 140 characters.
Connect this document to a meeting?
This document will be connected to this Circle. Check this box if you also want to connect it to a particular meeting.
Edit this circle
Allow members of the EnTranCe Community to apply to this circle as members? Setting this to 'No' will not affect your ability to invite new members.
This will control the URL of the circle
How often does this circle meet? E.g. once a week, once every two weeks, or once a month, etc.
Maximum file size: 5 MB
Maximum file size: 5 MB
Please select 1 to 3 OPFs
Add a New Revision Document
Document Title *
Document Source *
Upload a File *
Maximum file size: 10 MB
Share a Link *
Please ensure that visibility permissions for the document are set to Visible to Everyone with a Link. Only Circle Members will have access to the link.
One-line Description
Describe the document in 140 characters.