Friday, December 18, 2009

Play!, a promising java web framework

Recent years we saw a lot of java web frameworks coming to the scene and gaining popularity.Each of them brings features to ease web development or claim to address the problems with traditional java web frameworks like struts or spring. The common aim is to reduce the complexity and increase productivity. Play is a framework which claims a framework for the developers providing the best tools to ease the development and increase productivity. Here are the features which distinguishes it from other frameworks.

1)
Simple stateless mvc architecture
2) Based of REST principles
3) The framework compiles your sources and hot-reloads them into the JVM without the need to restart the server.So you can edit and reload the changes immediately.
4) Aimed to provide developer's productivity
5) TDD support with UI driven testing using integrated selenium .
6) Improved exception messages
7) HTTP to code mapping
8) Full stack application framework with support for common web application needs
9) Efficient templating engine
10) JPA with hibernate support.

Another remarkable point is good documentation and developer support through forum.I tried play with the sample application given in documentation and felt easy and developer friendly .Play comes with inbuilt server where you can deploy the application or import as a war file and deploy in another server. An interesting point is it breaks some java traditions in terms of packaging and MVC modeling.So it is time for playing your app.




Sunday, December 6, 2009

Deep and invisible web

It goal of a search engine is to index as much information in the web as possible. But is it possible to index the whole web with a highly powerful search engine given unlimited processing power?. It is not! There are various reasons. The web that can be indexed by search engines is called surface web. The web which is not part of the surface web is called deep web or invisible web. It is estimated that deep web is much more larger than surface web (In fact more than 10 times larger, even though the estimates vary). There are different reasons why the whole web is not indexable by search engines.
As per wikipedia deep Web resources may be classified into one or more of the following categories:
  • Dynamic content: which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
  • Unlinked content: pages which are not linked to by other pages, which may prevent web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks).
  • Private Web: sites that require registration and login (password-protected resources).
  • Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
  • Limited access content: sites that limit access to their pages in a technical way (e.g., using the CAPTHAs , or no-cache Pragma HTTP headers prohibit search engines from browsing them and creating cached copies).
  • Scripted content: pages that are only accessible through links produced by javascript as well as content dynamically downloaded from Web servers via flash or ajax solutions.
  • Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats handled by search engines.
There are various approaches taken by the search engines to index the deep web. For eg. Google’s approach to the Deep Web is to find HTML forms, send input to these forms, and index the resulting HTML pages.Yahoo made a small part of the deep Web searchable by releasing Yahoo! Subscriptions. This search engine searches through a few subscription-only Web sites and the user will be asked to login to access content. Kosmix instead, for any given search query taps into html forms in real-time through API calls, evaluates the results and organizes them into a topic page. Research is going on different approaches to tap this deep web ,but it is sure that large part of the web will be still invisible by search engines.