You spent weeks refining a marketing campaign… Taking the right customer decisions is a key part of your job and you want to get better at it…
You spent months worrying about how to up-skill and remain relevant… You are fascinated by Artificial Intelligence and Machine Learning… but not sure where to start…
You started learning python…but got stuck at model precision and lift…
You are a marketer and want to get better mileage out of your BI and analytics teams…
You are already a data scientist and want to pick up all the business-speak again…
welcome to the musings on data science blog.
As the King asked of Alice, we begin at the beginning, and tour the wonderful world of data science.
Enjoy!
Season 1: The Business World, Knowledge, and Models – A whistle-stop tour of concepts
E1 – The beginning:
Humans have the passion to control; they understand and control the world way beyond other animals. Humans also create representations of the world; they also possess an ability to detect patterns. Patterns can be simple or complex. They can be repetitive—like sunrise and sunset. Sometimes the patterns change—like days getting longer in the summer, and shorter in the winter. There are patterns to human behaviour too. These can help predict how humans will behave in certain situations. There are patterns in every aspect of the world that humans can perceive. The structures that represent these worldly patterns are called models. Models symbolise the world—the business world in our context. Understanding these patterns is intertwined with our ability to make decisions. Analytics or data science is the latest in an extensive line of tools, in human history, for detecting usable patterns and improving human control of the world.
Business is an enormous knowledge structure, and an extraordinarily complex one. It is a world of customers, development, marketing, profits, complaints, and many more events and relationships. Most of this we normally take for granted—though for data science, we need to define it explicitly and carefully. It thus calls for a unique perspective to look at the everyday minute events, objects, perception, and structures—the foundations of data science.
Events – something happens. But a lot follows. It could be a rock falling on the moon, or a billiards ball falling in a pocket. It changes things. The change is always relative to other things—this is especially important in data science. Because events carry a whole lot of assumptions. Most events are recorded as transactions—a sale of a policy, goods exchanged for $, report of a claim, retirement of a participant, etc. Seems straightforward—though these transactions are carried in an enormous cultural framework. Awareness of this cultural framework is important and powerful for the data scientist.
Objects – is something to which an event happens. E.g., An insurance policy gets sold. A claim gets settled. The rock falls. The penny drops… As with events, there are complications with objects too. For example—it is known that rocks are hard. Hardness is a property that should clearly help us define a rock. However true nature of world is not so straightforward. Some rocks are so soft they are fluids, and they flow—for example lava. We can, of course, change our definition to ‘something that sinks in water’. Again, pumice stone is soft but does not sink. Keeping aside these philosophical discussions, for data science, the features of the objects can form an appreciable matrix. In a business model at a bank—it took over two weeks of demanding work just to approach and agree on the definition of a ‘term deposit’ and ‘customer’— such intuitive concepts otherwise.
Perception – Like it or not, each one of us has already created a mental model of reality. Our perception is regulated and enforced by our mental models. To recall a now well-known incident—in April 1996, in a power plant not far from the busy city of Kiev, in Chernobyl, scientists were running some tests. As we know now, things began to go disastrously wrong. Scientists ‘knew’ that the plant was completely safe; they ‘knew’ that the glowing piece on the floor could not have been graphite; they ‘knew’ that the core of the plant could not have blown up. The reactor did melt, and core of the plant did blow up—yet engineers ‘knew’ that it was impossible to happen…It was their knowledge of what was true that prevented them from seeing what was happening. Such is the power of regulated and enforced knowledge that we already hold. It is true, particularly in business. The advisor ‘knows’ that the plan will be sold, the claim handler ‘knows’ that it is a genuine and bona-fide claim, the lender ‘knows’ that the loan will be paid back. The dentist knows that it will be a complex tooth extraction. Perception is how we view the world. Perceptions are built on events that happen to features of objects. They gain meaning when seen in a framework.
Data – Data starts simply with the recording of events of objects. However, not all the events of the universe can be recorded. So, we become selective. It is a necessary process. More is missed than is kept. What is seen useful is generally recorded—useful in a sense those things that improve our business framework. None of this is to point out that the process is bad or wrong. As data scientist, one must keep in mind that the data collected will always be intertwined with the framework used—such as customer relationship management (CRM), or claim settlement system, or transaction monitoring system.
Structures – Remember, data always starts as simple recording of events, objects, features. On this foundation, vast structures are built. Culture is one such structure. All our art, ethics, scientific pursuits, politics, medicine, trade are based on such framework. These larger cultural issues are important for data scientists even though they are not likely to create or re-engineer any culture. Working within a corporate culture such as ours—understanding what can and cannot be achieved, what is defined as success, and understanding where the best opportunity for changing things is—are issues important for data science. Because to some extent, small, or great, data science success depends absolutely on changing existing structure.
Systems – They only represent our abstraction of how the world works. The automobile industry in the 1960s used a manufacturing efficiency framework—focused on making cars as efficiently as possible. Corporate accounting reflected this framework and measured efficiency in terms of production. Corporate bonuses all reflected the activity in production plants. It worked. For that framework. Until the Japanese came with a totally different way of seeing things. The Americans did not take notice and could not see any problem—their worldview prevented them from seeing how the Japanese saw. After blindness, came denial, then rejection, and finally adjustment with the market reality.
Inspiration and Acknowledgement: Some early practitioners of analytics and data science that influenced the author:
1. Mr. Dorian Pyle – of Knowledge Stream Partners, Xchange, Naviant, Thinking Machines, and Data Miners and with various companies– Dorian is the author of Data Preparation for Data Mining, Business Modelling and Data Mining and the Handbook of Data Mining.
2. Dr Thomas Davenport of Competing on Analytics fame.
3. Anand Rajaraman of Cambrian Ventures and Kosmix
4. Jeffrey Ullman of Stanford