The Java Data Mining (JDM) specification is a standard Java API and Web Services definitions for data mining. This standard is at the proposed final draft stage under the Java Community Process (JCP). See http://www.jcp.org/jsr/detail/73.jsp  to access the specification and related documents. Review comments should be submitted to jsr-73-comments@jcp.org. The expert group is targeting summer 2004 for completion of the specification.

Historically, application developers coded homegrown data mining algorithms into applications, or used sophisticated end-user GUIs, which packaged a suite of algorithms, complete with support for data transformation, model building, testing, and scoring.

However, the ability to embed data mining end-to-end in applications using commercial data mining products was difficult. Vendors that provided an API did so with a proprietary solution, making the selection of a particular vendor's product risky and potentially costly should a different vendor's solution be required.  The ability to leverage data mining functionality via a standard API greatly reduces risk and potential cost. A standard API allows companies to draw on the strengths of multiple data mining vendors for solving business problems by applying the most appropriate algorithm implementation to a given problem without having to invest resources in learning each vendor's proprietary API. A standard API increases accessibility of data mining to application developers, making developer skills more transferable across vendor products.

Java Data Mining (JDM) addresses this need for the Java Community. By designing JDM with an extensible framework for the addition of new algorithms and functionality, vendors can differentiate themselves while providing developers with a familiar programming model. JDM specifies mining functions supporting classification, regression, clustering, association, and attribute importance, and mining algorithms supporting decision trees, neural networks, naïve bayes, support vector machines, and k-means. Data mining operations including, build, test, apply, import, and export, are performed synchronously or asynchronously using mining tasks. JDM also provides interfaces for metadata query, management, and persistence.

As a Java Specification Request under Sun's Java Community Process (JCP), JDM must go through several reviews and final vote by the JCP Executive Committee before being accepted as a Java standard. In addition, the JCP requires that the API have a Reference Implementation (RI) and Technology Compatibility Kit (TCK). The RI ensures that the specification is implementable while providing potential users and implementers a working system to understand intended behaviors. Vendors implementing the standard must certify their implementation by executing and passing the TCK. Feedback from the JDM Community Review and Public Review has been positive and supportive. 

As with any standard, defining compliance for vendor implementations raises myriad issues. Should all implementations be required to support all algorithms and features? Should the results of data mining operations, e.g., rules in a decision tree model, be the same for the same datasets across vendor implementations? For JDM, compliance is based on a core feature set with optional packages for each mining function and algorithm. This enables a vendor that specializes in neural networks to conform to the JDM standard while only implementing those features relevant to neural networks. JDM also provides supportsCapability methods that allow applications to determine at runtime if a vendor implementation supports a finer grained feature, e.g., whether classification model build accepts a cost matrix specification, or the clustering algorithm produces hierarchically arranged clusters. JDM does not specify the correctness of specific data mining results, e.g., model accuracy or scoring precision, since there is too much variability between vendor implementations to make this practical.

The expert group is already thinking about JDM 2.0. Some of the features being considered for JDM 2.0 include: sequential patterns, time series, transformations, ensemble models, apply for association, mining unstructured data such as text and images, model comparison, feature extraction, and multi-target models.

To facilitate public exchange of ideas on JDM, a new project on java.net has been created: "datamining" at https://datamining.dev.java.net/, providing a discussion forum, announcements, and document sharing among Java Data Mining users.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access