The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
The PDF Renderer is just what the name implies: an open source, all Java library which renders PDF documents to the screen using Java2D. Typically this means drawing into a Swing panel, but it could also draw to other Graphics2D implementations. We hope you will come up with cool things to do with it that we never thought of.
JUNG — the Java Universal Network/Graph Framework--is a software library that provides a common and extendible language for the modeling, analysis, and visualization of data that can be represented as a graph or network. It is written in Java, which allows JUNG-based applications to make use of the extensive built-in capabilities of the Java API, as well as those of other existing third-party Java libraries.