Mloss is a community effort at producing reproducible research via open source software, open access to data and results, and open standards for interchange.
Mahout currently has Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier High performance java collections (previously colt collections) A vibrant community and many more cool stuff to come by this summer thanks to Google summer of code
The Prediction API enables access to Google's machine learning algorithms to analyze your historic data and predict likely future outcomes. Upload your data to Google Storage for Developers, then use the Prediction API to make real-time decisions in your applications. The Prediction API implements supervised learning algorithms as a RESTful web service to let you leverage patterns in your data, providing more relevant information to your users. Run your predictions on Google's infrastructure and scale effortlessly as your data grows in size and complexity.
The Knowledge Discovery Machine Learning (KDML) group focuses on the neighboring subfields of computer science known as knowledge discovery in databases (KDD, sometimes referred to simply as data mining) and machine learning (ML). For us, these fields include on the one hand the automated analysis of large data sets using intelligent algorithms that are capable of extracting from the collected data hidden knowledge in order to produce models that can be used for prediction and decision making. On the other hand, they also include algorithms and systems that are capable of learning from experience and adapting to their environment or their users.
This year's discovery challenge presents two tasks in the new area
of social bookmarking. One task covers spam detection and
the other covers tag recommendations. As we are hosting the social bookmark and
publication sharing system BibSonomy, we are able to provide a dataset
of BibSonomy for the challenge. A training dataset for both tasks is provided at the beginning of the competition.
The test dataset will be released 48 hours before the final deadline. Due to a very tight schedule we cannot grant any deadline
extension.
The presentation of the results will take place at the ECML/PKDD workshop where the top teams are
invited to present their approaches and results.
A. Coates, H. Lee, und A. Ng. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Volume 15 von JMLR Workshop and Conference Proceedings, Seite 215--223. JMLR W&CP, (2011)
J. Tang, M. Hong, J. Li, und B. Liang. International Semantic Web Conference, Volume 4273 von Lecture Notes in Computer Science, Seite 640-653. Springer, (2006)