Project Domain / Category
Information mining and retrieval
Search Engine actually is an information retrieval system that helps users to find information stored on computer system or systems. The search results commonly known as “hits” are presented in the form of a list to the users. The current search engines like Google, Yahoo and MSN hits millions of records against a single query. Among these millions of records it’s very difficult and time consuming for the users to find the relevant information. These search engines search information based on key words mentioned in the query.
Content type aware search engine should have the ability to search based on the content type like video, pdf, image, html etc. Search interface of search engine should provide a dropdown list containing the type of contents (i.e video, pdf, image and html) and restrict user to select any of the given content type along with query terms. The search engine should have the capability to suggest users to select type of the contents they want to search, in case user not selecting the type from dropdown list. The search engine will start search only when a particular type/format of the content (video, doc, pdf, jpg, html and xml etc.) is selected. Students are needed to maintain a local database etc. to store the search results for future use. Students are required to select/specify a particular dataset to test and evaluate their project.
Main modules and their functions:
This project has the following basic modules: 1. Web Crawler: Web search engines work by storing information about many web pages, which they retrieve from the html itself. These pages are retrieved by a Web crawler which is an automated Web browser which follows every link on the site. The contents of each page are then analyzed to determine how it should be indexed.
2. Front end for query processing and their results: The front-end presents a search bar for users along with a dropdown list and the query processor parses the request and executes the search. The results are displayed by the front-end.
3. Data base: i. Maintaining a list or database for storing current search results. ii. Data about web pages are stored in an index database for use in later queries. The purpose of an index is to allow information to be found as quickly as possible.
Tools: The following tools can be used for developing the above project.
Microsoft.Net, SQL Server
1. Java, SQL Server/MySQL