Web Mining - All About It
Posted On Sunday, April 19, 2009 at at 10:04 PM by web researchThe World Wide Web is a huge expanse of information that is interlinked together to facilitate interactive access. Users can seek information that they desire by traveling from one object to another by following hyperlinks and URLs. The World Wide Web has over 800 million web pages that still continue to grow daily.
Because information on the internet grows daily, extracting crucial data becomes a daunting task. This is primly because it gets difficult to regulate the semi-structured and unstructured web content. Unlike print documents, information on the internet is constantly evolving and changing. And this is why database management becomes very complex. This necessitates the use of web mining tools.
Web mining involves the use of data mining tools to discover and extract information from the internet. Web mining can be divided into four subtasks:
• Resource Finding – The first step is to retrieve data from online and offline sources, be it resources on the internet like newsletters, websites content or HTML documents.
• Information Selection/ Pre-processing – After relevant information is extracted from the internet, the original data is transformed. The preprocessing of data could either be removing stop words, stemming or something that aims to obtain the desired information like finding phrases in the training corpus, representing the text in the first order logic form and so on.
• Generalization – This process involves identifying general patters and trends within individual websites as well as multiple websites. It usually requires a lot of data mining techniques and web oriented methodologies.
• Analysis – All the mined data and information is laid across, validated and the identified patterns are interpreted.
When a user views information, there are three basic factors that can influence the perception and evaluation process. They are:
• Web page Content
• Web page Design
• Website Design and Structure
While the web page contents include all the information and data available on the website, the other two factors have more to do with website accessibility and usability.
In the pursuit of relevant and useful information on the internet, there are three broad areas where web mining can be executed.
Web Content Mining
The process involves discovering useful information from web documents. The web has extended resources of data such as text, images, audio clips, video streams and so on. Internet research that involves mining of various types of data is known as multimedia data mining. Web content is generally unstructured in forms of free text or semi structured like HTML documents, tables and HTML pages. Web content mining aims at improving information finding on the internet and provide efficient results.
Web Structure Mining
This process involves discovering the data on the basis if underlying link structure on the web. This form of mining is based on the topology of the hyperlinks with or without a description of the links. It is very useful in categorizing web pages and identifying relationships, patters and trends within the website.
Web Usage Mining
This process involves analysis of the data generated by user behavior and browsing history. Web content mining and web structure mining relies on primary data but web usage mining relies on secondary data like data from the server logs, browser logs, user profiles etc.
Maneet Puri is a renowned entrepreneur and owner of LeXolution IT Services, a leading offshore outsourcing firm based in India. His firm offers a range of KPO services like web mining, data conversion, data processing and management services.
Article Source: Goarticles