Data mining is a topic that is mentioned often in the media and in boardrooms around the country, but its exact nature is not well understood by the general public. Data mining is related to statistics, another topic that can also seem challenging to the average person. In college, it’s common for students to be required to take at least one statistics course as condition of graduation, no matter what the discipline the student is pursuing. Students often express nervousness about taking a statistics class. Many students believe that statistics are a form of heavy mathematics, but they soon find out that statistics are the most useful and accessible form of mathematics to the average person other than simple arithmetic.
Everyone can use statistics in their daily lives. A high school freshman and a doctoral candidate can both understand the batting average of their favorite baseball player, for instance. Every business uses statistics every day, and computers equipped with simple spreadsheet technology can produce reliable statistics at the touch of a button.
Statistics are just organized data. The true purpose of statistics is to keep track of more data than any person can keep in their head. Statistics require analysis, however. Statistics can present the same information in different ways to suggest very different conclusions about the data. Statistics are a useful tool, but only if you know how to use them.
If you want to use collected data to produce actionable conclusions, you need to look for relationships in the data. Smart managers look for patterns in a database that suggest underlying relationships. Statistical analysis can point towards the chances that an event will occur. Smart analysis can point to patterns that are significant, while ignoring patterns that are simply random. Smart analysis of data can lead to moneymaking opportunities.
Mining data is a form of statistical analysis, but it uses the same digital horsepower that collects massive amounts of data to draw conclusions about what might happen in the future. Companies call this process many things. A business data management firm like Corporate Technologies might offer services like marketing analytics, supply chain analysis, or strategic performance management, but they all are forms of pattern discovery in statistical data.
Making Money With Nearest Neighbor Algorithms: The same technology that makes gathering massive amounts of statistical data possible makes mining that data into a profitable possibility. The oldest techniques in mining data are called nearest neighbor and clustering techniques. Clustering is intuitive, and people are accustomed to looking for groupings in databases of all kinds, and in real life. When you do your laundry, you separate the white garments from dark colored clothes before washing. For a business, looking for such clustering is a fluid, ongoing effort. Managers look for clusters in real time, as their customers can change their relationship to a business over time. When marketers talk about a sales funnel, for instance, they refer to the same customers in different ways depending on how close they are to making a purchase. Existing customers are also ranked differently during the life cycle of their relationship with the company. A company that sells diapers will look at an existing customer who is 25 years old very differently than one who is 45 years old, for instance. Analysis of clustering has to be dynamic to be effective.
Another way to use the concept of nearness is to look at more than one data point at the same time to determine the importance of nearest neighbor relationships. If you were in the business of selling cars, you wouldn’t try to sell a Cadillac to someone searching for a tiny economy car. However, if you add the fact that someone is searching for cars at a certain price point, you could try to sell them either a brand new economy car or a used luxury car at the same price. More information improves data analysis if you know how to set up near neighbor algorithms effectively.
Nearest neighbor techniques are easily understood by people, but can be invisible to statistical programs unless the data is mined in the correct way. Statisticians often refer to information that is kept separately as being in separate silos, which is an apt description. A computer program that can find relationships between data points in separate silos is a simple form of database mining. If you were to desire to predict the income of an unknown person using statistics that are available to you, you might find out the person’s address, and do a statistical analysis of the incomes in that neighborhood. Even though the data necessary to make these connections and use them to make a prediction is held in different information silos, data management software can find the relationships between them.
Examples of Nearest Neighbor Analysis in Business: If you were managing an informational website that sold advertising, it would be in your best interest to keep the reader on your page for as long as possible. The longer a customer reads, the more likely it is that they will see the display advertising, become interested, and click through. Your advertisers will support your site over others because you deliver pre-screened customers to them.
In order to keep readers on your site, you will have to show them more text when they are finished with the article that brought them to the site. Most websites use some form of search engine to allow users to look for other information, but it’s better to show readers related documents without making them search for them. The problem is determining what exactly should be considered a related document in the absence of hard information about the preferences of any given reader. Data mining can improve the quality of the predictions to offer additional material to the customer by identifying important characteristics in the document being viewed, and finding other documents that share those characteristics. If you Data mining process is inexact, you’ll show your readers articles that share unimportant characteristics, and they will leave and read elsewhere. That’s why companies need robust analytic procedures in place if they hope to compete in a crowded digital marketplace.