A Little Bit of Machine Intelligence
One interesting trait of technologists is that as a body, we tend to use the approaches that we have learned and trust and are “comfortable with”. We can become protective of our chosen techniques, approaches and use of tools, honing these as we go through our careers. Should a new product or approach come on the scene, some of us might welcome the challenge as an opportunity while others might stay at arms length because it diverges from our view of “how things should be done”.
The technical landscape is changing at increasing rates. What is comfortable today will be old news and obsolete tomorrow. Only 20% of the Fortune 500 of 25 years ago have survived. Technology has been obsolesced to a much greater degree. Yet it is not only technology that has evolved but also our needs to process new types of data and to solve much more challenging problems.
Sometimes, we are faced with problems that are not easily solved via conventional logic. Sometimes we don't even know what the desired outcome should be. Take for example a telco wanting to detect virus propagation within thousands of network switches. We do have data being harvested from the switches, however, there are many variables. Which variables or combination of them indicate that a virus is spreading? There is also a time element to be considered. Creating logic to resolve this problem is a pointless task. An expert system would do no better. We don't know what to look for and tomorrow there may be a different virus with a different propagation profile.
While daunting, there are approaches that may help. Machine learning offers up a glimmer of hope. This is where a simple approach like Decision Trees can provide a welcome relief from problem complexity. A decision tree can be fed a training set of input data (attributes). For this to work, we need to have collected data related to an attack that has taken place in the past. The training part is that as the attack is unfolding, we need to let the tree know whether the sequence of attribute values is related to an attack. The tree will then build the underlying logic to detect a future event.
A question that we naturally might ask is whether a future event will unfold exactly the same way. The nice thing about a decision tree is that it does accommodate variance in its decision making. A future event does not have to be in 100% alignment with the signature of a past event as our tree can provide a level of certainty over the current data input. Additionally, as new signatures are found, they can be added to our training set to improve our decisioning.
Decision trees can be implemented using Ross Quinlan's C4.5 or Spark's machine learning library (Mllib) as but two options of many; both open source projects. With this simple approach, we can begin to tackle problems that defy our abilities to solve with conventional logic. This is just scratching the surface of what can be accomplished if we step back and take a fresh look at the capabilities that are now available to us. Equally amazing is that due to the efforts of many in the research and open source community, these capabilities come at no cost to us.
About the Author: Stan Mlynarczyk, Ph.D
Stan is a guest contributor to the Pandata Group blog and a part of our associate network. He delivers an impressive balance of appropriate Big Data education and technical acumen that includes a Ph.D, with a focus in Artificial Intelligence (semantic analysis of text analytics), along with creating and leading the Big Data architecture and Hadoop implementation practice of the Teradata Aster practice. Pandata Group is also partnered with his software company, Chicago Technology Incorporated (www.chitechcorp.com), delivering software for and Text Analytics. Looking into the business value of Hadoop and Big Data? Let’s schedule a discussion by contacting us at firstname.lastname@example.org.