QuickSearch:   Number of matching entries: 0.

AuthorTitleYearJournal/ProceedingsReftypeDOI/URL
Aggarwal, C. C. & Yu, P. S. Online Analysis of Community Evolution in Data Streams. 2005 SDM   inproceedings URL  
BibTeX:
@inproceedings{conf/sdm/AggarwalY05,
  author = {Aggarwal, Charu C. and Yu, Philip S.},
  title = {Online Analysis of Community Evolution in Data Streams.},
  booktitle = {SDM},
  year = {2005},
  url = {http://web.mit.edu/charu/www/aggar142.pdf }
}
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.-H., Steinbach, M., Hand, D. & Steinberg, D. Top 10 algorithms in data mining 2008 Knowledge and Information Systems   article URL  
Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM)
December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community.With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current andfurther research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, associationanalysis, and link mining, which are all among the most important topics in data mining research and development.
BibTeX:
@article{wu2008wu,
  author = {Wu, Xindong and Kumar, Vipin and Quinlan, J. Ross and Ghosh, Joydeep and Yang, Qiang and Motoda, Hiroshi and McLachlan, Geoffrey and Ng, Angus and Liu, Bing and Yu, Philip and Zhou, Zhi-Hua and Steinbach, Michael and Hand, David and Steinberg, Dan},
  title = {Top 10 algorithms in data mining},
  journal = {Knowledge and Information Systems},
  publisher = {Springer},
  year = {2008},
  volume = {14},
  number = {1},
  pages = {1--37},
  url = {http://dx.doi.org/10.1007/s10115-007-0114-2}
}
Romero, C. & Ventura, S. Educational data mining: A survey from 1995 to 2005 2007 Expert Syst. Appl.   article DOIURL  
Abstract: Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.
BibTeX:
@article{romero07,
  author = {Romero, C. and Ventura, S.},
  title = {Educational data mining: A survey from 1995 to 2005},
  journal = {Expert Syst. Appl.},
  publisher = {Pergamon Press, Inc.},
  year = {2007},
  volume = {33},
  number = {1},
  pages = {135--146},
  url = {http://portal.acm.org/citation.cfm?id=1223659},
  doi = {http://dx.doi.org/10.1016/j.eswa.2006.04.005}
}
Jain, A. K., Murty, M. N. & Flynn, P. J. Data clustering: a review 1999 ACM Comput. Surv.   article DOIURL  
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
BibTeX:
@article{331504,
  author = {Jain, A. K. and Murty, M. N. and Flynn, P. J.},
  title = {Data clustering: a review},
  journal = {ACM Comput. Surv.},
  publisher = {ACM},
  year = {1999},
  volume = {31},
  number = {3},
  pages = {264--323},
  url = {http://portal.acm.org/citation.cfm?id=331499.331504&coll=Portal&dl=ACM&CFID=26215063&CFTOKEN=18848029},
  doi = {http://doi.acm.org/10.1145/331499.331504}
}
Fayyad, U. M., Piatetsky-Shapiro, G. & Smyth, P. From data mining to knowledge discovery: an overview 1996 Advances in knowledge discovery and data mining   incollection URL  
Abstract: Data mining and knowledge discovery in
tabases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging
eld, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article
ntions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge
scovery, and current and future research directions in the field.
BibTeX:
@incollection{fayyad1996data,
  author = {Fayyad, Usama M. and Piatetsky-Shapiro, Gregory and Smyth, Padhraic},
  title = {From data mining to knowledge discovery: an overview},
  booktitle = {Advances in knowledge discovery and data mining},
  publisher = {American Association for Artificial Intelligence},
  year = {1996},
  pages = {1--34},
  url = {http://portal.acm.org/citation.cfm?id=257942}
}
Tufte, E. R. The Visual Display of Quantitative Information 2001   book URL  
BibTeX:
@book{tufte2001visual,
  author = {Tufte, Edward R.},
  title = {The Visual Display of Quantitative Information},
  publisher = {Graphics Press},
  year = {2001},
  edition = {Second},
  url = {http://www.amazon.com/Visual-Display-Quantitative-Information-2nd/dp/0961392142%3FSubscriptionId%3D192BW6DQ43CK9FN0ZGG2%26tag%3Dws%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0961392142}
}
Tramp, S., Frischmuth, P., Ermilov, T. & Auer, S. Weaving a Social Data Web with Semantic Pingback 2010 Proceedings of the EKAW 2010 - Knowledge Engineering and Knowledge Management by the Masses; 11th October-15th October 2010 - Lisbon, Portugal   inproceedings URL  
Abstract: In this paper we tackle some of the most pressing obstacles of the emerging Linked Data Web, namely the quality, timeliness and coherence as well as direct end user benefits. We present an approach for complementing the Linked Data Web with a social dimension by extending the well-known Pingback mechanism, which is a technological cornerstone of the blogosphere, towards a Semantic Pingback. It is based on the advertising of an RPC service for propagating typed RDF links between Data Web resources. Semantic Pingback is downwards compatible with conventional Pingback implementations, thus allowing to connect and interlink resources on the Social Web with resources on the Data Web. We demonstrate its usefulness by showcasing use cases of the Semantic Pingback implementations in the semantic wiki OntoWiki and the Linked Data interface for database-backed Web applications Triplify.
BibTeX:
@inproceedings{tramp2010weaving,
  author = {Tramp, Sebastian and Frischmuth, Philipp and Ermilov, Timofey and Auer, Sören},
  title = {Weaving a Social Data Web with Semantic Pingback},
  booktitle = {Proceedings of the EKAW 2010 - Knowledge Engineering and Knowledge  Management by the Masses; 11th October-15th October 2010 - Lisbon,  Portugal},
  publisher = {Springer},
  year = {2010},
  volume = {6317},
  pages = {135--149},
  url = {http://svn.aksw.org/papers/2010/EKAW_SemanticPingback/public.pdf}
}
Tatti, N., Mielikainen, T., Gionis, A. & Mannila, H. What is the Dimension of Your Binary Data? 2006 Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM 2006)   inproceedings DOIURL  
Abstract: Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.
BibTeX:
@inproceedings{tatti2006dimension,
  author = {Tatti, N. and Mielikainen, T. and Gionis, A. and Mannila, H.},
  title = {What is the Dimension of Your Binary Data?},
  booktitle = {Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM 2006)},
  year = {2006},
  pages = {603--612},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4053086},
  doi = {http://dx.doi.org/10.1109/ICDM.2006.167}
}
Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-Law Distributions in Empirical Data 2009 SIAM Review   article DOIURL  
Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.
BibTeX:
@article{clauset2009powerlaw,
  author = {Clauset, Aaron and Shalizi, Cosma Rohilla and Newman, M. E. J.},
  title = {Power-Law Distributions in Empirical Data},
  journal = {SIAM Review},
  publisher = {SIAM},
  year = {2009},
  volume = {51},
  number = {4},
  pages = {661--703},
  url = {http://link.aip.org/link/?SIR/51/661/1},
  doi = {http://dx.doi.org/10.1137/070710111}
}
Dean, J. & Ghemawat, S. MapReduce: simplified data processing on large clusters 2008 Communications of the ACM   article DOIURL  
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a <i>map</i> and a <i>reduce</i> function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.
BibTeX:
@article{dean2008mapreduce,
  author = {Dean, Jeffrey and Ghemawat, Sanjay},
  title = {MapReduce: simplified data processing on large clusters},
  journal = {Communications of the ACM},
  publisher = {ACM},
  year = {2008},
  volume = {51},
  number = {1},
  pages = {107--113},
  url = {http://doi.acm.org/10.1145/1327452.1327492},
  doi = {http://dx.doi.org/10.1145/1327452.1327492}
}
Abiteboul, S., McHugh, J., Rys, M., Vassalos, V. & Wiener, J. Incremental Maintenance for Materialized Views over Semistructured Data 1998 24rd International Conference on Very Large Data Bases   inproceedings URL  
Abstract: Semistructured data is not strictly typed like relational or object-oriented data and may be irregular or incomplete. It often arises in practice, e.g., when heterogeneous data sources are integrated or data is taken from the World Wide Web. Views over semistructured data can be used to filter the data and to restructure (or provide structure to) it. To achieve fast query response time, these views are often materialized. This paper studies incremental maintenance techniques for materialized views over semistructured data. We use the graph-based data model OEM and the query language Lorel, developed at Stanford, as the framework for our work. We propose a new algorithm that produces a set of queries that compute the changes to the view based upon a change to the source. We develop an analytic cost model and compare the cost of executing our incremental maintenance algorithm to that of recomputing the view. We show that for nearly all types of database updates, it is more efficient to apply our incremental maintenance algorithm to the view than to recompute the view from the database, even when there are thousands of such updates.
BibTeX:
@inproceedings{abiteboul1998incremental,
  author = {Abiteboul, S. and McHugh, J. and Rys, M. and Vassalos, V. and Wiener, J.},
  title = {Incremental Maintenance for Materialized Views over Semistructured Data},
  booktitle = {24rd International Conference on Very Large Data Bases},
  publisher = {Morgan Kaufmann},
  year = {1998},
  pages = {38--49},
  url = {http://ilpubs.stanford.edu:8090/340/}
}
Behm, A., Borkar, V., Carey, M., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y. & Tsotras, V. ASTERIX: towards a scalable, semistructured data platform for evolving-world models 2011 Distributed and Parallel Databases   article DOIURL  
Abstract: ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal—the storage and analysis of data pertaining to evolving-world models . We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.
BibTeX:
@article{behm2011asterix,
  author = {Behm, Alexander and Borkar, Vinayak and Carey, Michael and Grover, Raman and Li, Chen and Onose, Nicola and Vernica, Rares and Deutsch, Alin and Papakonstantinou, Yannis and Tsotras, Vassilis},
  title = {ASTERIX: towards a scalable, semistructured data platform for evolving-world models},
  journal = {Distributed and Parallel Databases},
  publisher = {Springer},
  year = {2011},
  volume = {29},
  number = {3},
  pages = {185--216},
  url = {http://dx.doi.org/10.1007/s10619-011-7082-y},
  doi = {http://dx.doi.org/10.1007/s10619-011-7082-y}
}
Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V., Bu, Y., Carey, M., Grover, R., Heilbron, Z., Kim, Y.-S., Li, C., Onose, N., Pirzadeh, P., Vernica, R. & Wen, J. ASTERIX: an open source system for "Big Data" management and analysis (demo) 2012 Proceedings of the VLDB Endowment   article URL  
Abstract: At UC Irvine, we are building a next generation parallel database system, called ASTERIX, as our approach to addressing today's "Big Data" management challenges. ASTERIX aims to combine time-tested principles from parallel database systems with those of the Web-scale computing community, such as fault tolerance for long running jobs. In this demo, we present a whirlwind tour of ASTERIX, highlighting a few of its key features. We will demonstrate examples of our data definition language to model semi-structured data, and examples of interesting queries using our declarative query language. In particular, we will show the capabilities of ASTERIX for answering geo-spatial queries and fuzzy queries, as well as ASTERIX' data feed construct for continuously ingesting data.
BibTeX:
@article{alsubaiee2012asterix,
  author = {Alsubaiee, Sattam and Altowim, Yasser and Altwaijry, Hotham and Behm, Alexander and Borkar, Vinayak and Bu, Yingyi and Carey, Michael and Grover, Raman and Heilbron, Zachary and Kim, Young-Seok and Li, Chen and Onose, Nicola and Pirzadeh, Pouria and Vernica, Rares and Wen, Jian},
  title = {ASTERIX: an open source system for "Big Data" management and analysis (demo)},
  journal = {Proceedings of the VLDB Endowment},
  publisher = {VLDB Endowment},
  year = {2012},
  volume = {5},
  number = {12},
  pages = {1898--1901},
  url = {http://dl.acm.org/citation.cfm?id=2367502.2367532}
}
Muniswamy-Reddy, K.-K. & Seltzer, M. Provenance as first class cloud data 2010 SIGOPS Operating Systems Review   article DOIURL  
Abstract: Digital provenance is meta-data that describes the ancestry or history of a digital object. Most work on provenance focuses on how provenance increases the value of data to consumers. However, provenance is also valuable to storage providers. For example, provenance can provide hints on access patterns, detect anomalous behavior, and provide enhanced user search capabilities. As the next generation storage providers, cloud vendors are in the unique position to capitalize on this opportunity to incorporate provenance as a fundamental storage system primitive. To date, cloud offerings have not yet done so. We provide motivation for providers to treat provenance as first class data in the cloud and based on our experience with provenance in a local storage system, suggest a set of requirements that make provenance feasible and attractive.
BibTeX:
@article{muniswamyreddy2010provenance,
  author = {Muniswamy-Reddy, Kiran-Kumar and Seltzer, Margo},
  title = {Provenance as first class cloud data},
  journal = {SIGOPS Operating Systems Review},
  publisher = {ACM},
  year = {2010},
  volume = {43},
  number = {4},
  pages = {11--16},
  url = {http://doi.acm.org/10.1145/1713254.1713258},
  doi = {http://dx.doi.org/10.1145/1713254.1713258}
}
Liu, X., Lu, M., Ooi, B. C., Shen, Y., Wu, S. & Zhang, M. CDAS: a crowdsourcing data analytics system 2012 Proceedings of the VLDB Endowment   article URL  
Abstract: Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.</p> <p>To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.
BibTeX:
@article{liu2012crowdsourcing,
  author = {Liu, Xuan and Lu, Meiyu and Ooi, Beng Chin and Shen, Yanyan and Wu, Sai and Zhang, Meihui},
  title = {CDAS: a crowdsourcing data analytics system},
  journal = {Proceedings of the VLDB Endowment},
  publisher = {VLDB Endowment},
  year = {2012},
  volume = {5},
  number = {10},
  pages = {1040--1051},
  url = {http://dl.acm.org/citation.cfm?id=2336664.2336676}
}
Goodwin, J., Dolbear, C. & Hart, G. Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web 2008 Transactions in GIS   article DOIURL  
Abstract: Ordnance Survey, the national mapping agency of Great Britain, is investigating how semantic web technologies assist its role as a geographical information provider. A major part of this work involves the development of prototype products and datasets in RDF. This article discusses the production of an example dataset for the administrative geography of Great Britain, demonstrating the advantages of explicitly encoding topological relations between geographic entities over traditional spatial queries. We also outline how these data can be linked to other datasets on the web of linked data and some of the challenges that this raises.
BibTeX:
@article{goodwin2008geographical,
  author = {Goodwin, John and Dolbear, Catherine and Hart, Glen},
  title = {Geographical Linked Data: The Administrative Geography of Great Britain on the Semantic Web},
  journal = {Transactions in GIS},
  publisher = {Blackwell Publishing Ltd},
  year = {2008},
  volume = {12},
  pages = {19--30},
  url = {http://dx.doi.org/10.1111/j.1467-9671.2008.01133.x},
  doi = {http://dx.doi.org/10.1111/j.1467-9671.2008.01133.x}
}
Martins, B., Manguinhas, H. & Borbinha, J. Extracting and Exploring the Geo-Temporal Semantics of Textual Resources 2008 Proceedings of the International Conference on Semantic Computing   inproceedings DOIURL  
Abstract: Geo-temporal criteria are important for filtering, grouping and prioritizing information resources. This presents techniques for extracting semantic geo-temporal information from text, using simple text mining methods that leverage on a gazetteer. A prototype system, implementing the proposed methods and capable of displaying information over maps and timelines, is described. This prototype can take input in RSS, demonstrating the application to content from many different online sources. Experimental results demonstrate the efficiency and accuracy of the proposed approaches.
BibTeX:
@inproceedings{martins2008extracting,
  author = {Martins, B. and Manguinhas, H. and Borbinha, J.},
  title = {Extracting and Exploring the Geo-Temporal Semantics of Textual Resources},
  booktitle = {Proceedings of the International Conference on Semantic Computing},
  publisher = {IEEE Computer Society},
  year = {2008},
  pages = {1--9},
  url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4597167},
  doi = {http://dx.doi.org/10.1109/ICSC.2008.86}
}
Joachims, T., Granka, L., Pan, B., Hembrooke, H. & Gay, G. Accurately interpreting clickthrough data as implicit feedback 2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval   inproceedings DOIURL  
Abstract: This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average.
BibTeX:
@inproceedings{joachims2005accurately,
  author = {Joachims, Thorsten and Granka, Laura and Pan, Bing and Hembrooke, Helene and Gay, Geri},
  title = {Accurately interpreting clickthrough data as implicit feedback},
  booktitle = {Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval},
  publisher = {ACM},
  year = {2005},
  pages = {154--161},
  url = {http://doi.acm.org/10.1145/1076034.1076063},
  doi = {http://dx.doi.org/10.1145/1076034.1076063}
}
Joachims, T. Optimizing search engines using clickthrough data 2002 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining   inproceedings DOIURL  
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
BibTeX:
@inproceedings{joachims2002optimizing,
  author = {Joachims, Thorsten},
  title = {Optimizing search engines using clickthrough data},
  booktitle = {Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining},
  publisher = {ACM},
  year = {2002},
  pages = {133--142},
  url = {http://doi.acm.org/10.1145/775047.775067},
  doi = {http://dx.doi.org/10.1145/775047.775067}
}
Pereira Nunes, B., Kawase, R., Dietze, S., Taibi, D., Casanova, M. A. & Nejdl, W. Can Entities be Friends? 2012 Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference   inproceedings URL  
Abstract: The richness of the (Semantic) Web lies in its ability to link related resources as well as data across the Web. However, while relations within particular datasets are often well defined, links between disparate datasets and corpora of Web resources are rare. The increasingly widespread use of cross-domain reference datasets, such as Freebase and DBpedia for annotating and enriching datasets as well as document corpora, opens up opportunities to exploit their inherent semantics to uncover semantic relationships between disparate resources. In this paper, we present an approach to uncover relationships between disparate entities by analyzing the graphs of used reference datasets. We adapt a relationship assessment methodology from social network theory to measure the connectivity between entities in reference datasets and exploit these measures to identify correlated Web resources. Finally, we present an evaluation of our approach using the publicly available datasets Bibsonomy and USAToday.
BibTeX:
@inproceedings{pereiranunes2012entities,
  author = {Pereira Nunes, Bernardo and Kawase, Ricardo and Dietze, Stefan and Taibi, Davide and Casanova, Marco Antonio and Nejdl, Wolfgang},
  title = {Can Entities be Friends?},
  booktitle = {Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference},
  year = {2012},
  volume = {906},
  pages = {45--57},
  url = {http://ceur-ws.org/Vol-906/paper6.pdf}
}
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R. & Ives, Z. DBpedia: A Nucleus for a Web of Open Data 2007 The Semantic Web   incollection DOIURL  
Abstract: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.
BibTeX:
@incollection{auer2007dbpedia,
  author = {Auer, Sören and Bizer, Christian and Kobilarov, Georgi and Lehmann, Jens and Cyganiak, Richard and Ives, Zachary},
  title = {DBpedia: A Nucleus for a Web of Open Data},
  booktitle = {The Semantic Web},
  publisher = {Springer},
  year = {2007},
  volume = {4825},
  pages = {722--735},
  url = {http://dx.doi.org/10.1007/978-3-540-76298-0_52},
  doi = {http://dx.doi.org/10.1007/978-3-540-76298-0_52}
}
Suchanek, F. M., Kasneci, G. & Weikum, G. YAGO: a core of semantic knowledge 2007 Proceedings of the 16th international conference on World Wide Web   inproceedings DOIURL  
Abstract: We present YAGO, a light-weight and extensible ontology with high coverage and quality. YAGO builds on entities and relations and currently contains more than 1 million entities and 5 million facts. This includes the Is-A hierarchy as well as non-taxonomic relations between entities (such as HASONEPRIZE). The facts have been automatically extracted from Wikipedia and unified with WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in <i>quality</i> by adding knowledge about individuals like persons, organizations, products, etc. with their semantic relationships - and in <i>quantity</i> by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correctness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information extraction techniques.
BibTeX:
@inproceedings{suchanek2007semantic,
  author = {Suchanek, Fabian M. and Kasneci, Gjergji and Weikum, Gerhard},
  title = {YAGO: a core of semantic knowledge},
  booktitle = {Proceedings of the 16th international conference on World Wide Web},
  publisher = {ACM},
  year = {2007},
  pages = {697--706},
  url = {http://doi.acm.org/10.1145/1242572.1242667},
  doi = {http://dx.doi.org/10.1145/1242572.1242667}
}
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R. & Hellmann, S. DBpedia - A crystallization point for the Web of Data 2009 Web Semantics: Science, Services and Agents on the World Wide Web   article DOIURL  
Abstract: The DBpedia project is a community effort to extract structured information from Wikipedia and to make this information accessible on the Web. The resulting DBpedia knowledge base currently describes over 2.6 million entities. For each of these entities, DBpedia defines a globally unique identifier that can be dereferenced over the Web into a rich RDF description of the entity, including human-readable definitions in 30 languages, relationships to other resources, classifications in four concept hierarchies, various facts as well as data-level links to other Web data sources describing the entity. Over the last year, an increasing number of data publishers have begun to set data-level links to DBpedia resources, making DBpedia a central interlinking hub for the emerging Web of Data. Currently, the Web of interlinked data sources around DBpedia provides approximately 4.7 billion pieces of information and covers domains such as geographic information, people, companies, films, music, genes, drugs, books, and scientific publications. This article describes the extraction of the DBpedia knowledge base, the current status of interlinking DBpedia with other data sources on the Web, and gives an overview of applications that facilitate the Web of Data around DBpedia.
BibTeX:
@article{bizer2009dbpedia,
  author = {Bizer, Christian and Lehmann, Jens and Kobilarov, Georgi and Auer, Sören and Becker, Christian and Cyganiak, Richard and Hellmann, Sebastian},
  title = {DBpedia - A crystallization point for the Web of Data},
  journal = {Web Semantics: Science, Services and Agents on the World Wide Web},
  year = {2009},
  volume = {7},
  number = {3},
  pages = {154--165},
  url = {http://www.sciencedirect.com/science/article/pii/S1570826809000225},
  doi = {http://dx.doi.org/10.1016/j.websem.2009.07.002}
}
Karger, D. Standards opportunities around data-bearing Web pages 2013 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences   article DOIURL  
Abstract: The evolving Web has seen ever-growing use of structured data, thanks to the way it enhances information authoring, querying, visualization and sharing. To date, however, most structured data authoring and management tools have been oriented towards programmers and Web developers. End users have been left behind, unable to leverage structured data for information management and communication as well as professionals. In this paper, I will argue that many of the benefits of structured data management can be provided to end users as well. I will describe an approach and tools that allow end users to define their own schemas (without knowing what a schema is), manage data and author (not program) interactive Web visualizations of that data using the Web tools with which they are already familiar, such as plain Web pages, blogs, wikis and WYSIWYG document editors. I will describe our experience deploying these tools and some lessons relevant to their future evolution.
BibTeX:
@article{karger2013standards,
  author = {Karger, David},
  title = {Standards opportunities around data-bearing Web pages},
  journal = {Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences},
  year = {2013},
  volume = {371},
  number = {1987},
  url = {http://rsta.royalsocietypublishing.org/content/371/1987/20120381.abstract},
  doi = {http://dx.doi.org/10.1098/rsta.2012.0381}
}
Berners-Lee, T. & O’Hara, K. The read–write Linked Data Web 2013 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences   article DOIURL  
Abstract: This paper discusses issues that will affect the future development of the Web, either increasing its power and utility, or alternatively suppressing its development. It argues for the importance of the continued development of the Linked Data Web, and describes the use of linked open data as an important component of that. Second, the paper defends the Web as a read–write medium, and goes on to consider how the read–write Linked Data Web could be achieved.
BibTeX:
@article{bernerslee2013readwrite,
  author = {Berners-Lee, Tim and O’Hara, Kieron},
  title = {The read–write Linked Data Web},
  journal = {Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences},
  year = {2013},
  volume = {371},
  number = {1987},
  url = {http://rsta.royalsocietypublishing.org/content/371/1987/20120513.abstract},
  doi = {http://dx.doi.org/10.1098/rsta.2012.0513}
}
Rula, A., Palmonari, M., Harth, A., Stadtmüller, S. & Maurino, A. On the Diversity and Availability of Temporal Information in Linked Open Data 2012 The Semantic Web – ISWC 2012   incollection DOIURL  
Abstract: An increasing amount of data is published and consumed on the Web according to the Linked Data paradigm. In consideration of both publishers and consumers, the temporal dimension of data is important. In this paper we investigate the characterisation and availability of temporal information in Linked Data at large scale. Based on an abstract definition of temporal information we conduct experiments to evaluate the availability of such information using the data from the 2011 Billion Triple Challenge (BTC) dataset. Focusing in particular on the representation of temporal meta-information, i.e., temporal information associated with RDF statements and graphs, we investigate the approaches proposed in the literature, performing both a quantitative and a qualitative analysis and proposing guidelines for data consumers and publishers. Our experiments show that the amount of temporal information available in the LOD cloud is still very small; several different models have been used on different datasets, with a prevalence of approaches based on the annotation of RDF documents.
BibTeX:
@incollection{rula2012diversity,
  author = {Rula, Anisa and Palmonari, Matteo and Harth, Andreas and Stadtmüller, Steffen and Maurino, Andrea},
  title = {On the Diversity and Availability of Temporal Information in Linked Open Data},
  booktitle = {The Semantic Web – ISWC 2012},
  publisher = {Springer },
  year = {2012},
  volume = {7649},
  pages = {492--507},
  url = {http://dx.doi.org/10.1007/978-3-642-35176-1_31},
  doi = {http://dx.doi.org/10.1007/978-3-642-35176-1_31}
}
Van de Sompel, H., Sanderson, R., Nelson, M. L., Balakireva, L. L., Shankar, H. & Ainsworth, S. An HTTP-Based Versioning Mechanism for Linked Data 2010 Proceedings of Linked Data on the Web (LDOW2010)   inproceedings URL  
Abstract: Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also available, for example, as resource versions in Content Management Systems or archival resources in Web Archives such as the Internet Archive. This paper introduces a resource versioning mechanism that is fully based on HTTP and uses datetime as a global version indicator. The approach allows "follow your nose" style navigation both from the current time-generic resource to associated time-specific version resources as well as among version resources. The proposed versioning mechanism is congruent with the Architecture of the World Wide Web, and is based on the Memento framework that extends HTTP with transparent content negotiation in the datetime dimension. The paper shows how the versioning approach applies to Linked Data, and by means of a demonstrator built for DBpedia, it also illustrates how it can be used to conduct a time-series analysis across versions of Linked Data descriptions.
BibTeX:
@inproceedings{vandesompel2010httpbased,
  author = {Van de Sompel, Herbert and Sanderson, Robert and Nelson, Michael L. and Balakireva, Lyudmila L. and Shankar, Harihar and Ainsworth, Scott},
  title = {An HTTP-Based Versioning Mechanism for Linked Data},
  booktitle = {Proceedings of Linked Data on the Web (LDOW2010)},
  publisher = {arXiv},
  year = {2010},
  number = {1003.3661},
  url = {http://arxiv.org/abs/1003.3661}
}
Bechhofer, S., Buchan, I., De Roure, D., Missier, P., Ainsworth, J., Bhagat, J., Couch, P., Cruickshank, D., Delderfield, M., Dunlop, I., Gamble, M., Michaelides, D., Owen, S., Newman, D., Sufi, S. & Goble, C. Why linked data is not enough for scientists 2013 Future Generation Computer Systems   article DOIURL  
Abstract: Scientific data represents a significant portion of the linked open data cloud and scientists stand to benefit from the data fusion capability this will afford. Publishing linked data into the cloud, however, does not ensure the required reusability. Publishing has requirements of provenance, quality, credit, attribution and methods to provide the reproducibility that enables validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of Research Objects as first class citizens for sharing and publishing.
BibTeX:
@article{bechhofer2013linked,
  author = {Bechhofer, Sean and Buchan, Iain and De Roure, David and Missier, Paolo and Ainsworth, John and Bhagat, Jiten and Couch, Philip and Cruickshank, Don and Delderfield, Mark and Dunlop, Ian and Gamble, Matthew and Michaelides, Danius and Owen, Stuart and Newman, David and Sufi, Shoaib and Goble, Carole},
  title = {Why linked data is not enough for scientists},
  journal = {Future Generation Computer Systems},
  year = {2013},
  volume = {29},
  number = {2},
  pages = {599--611},
  url = {http://www.sciencedirect.com/science/article/pii/S0167739X11001439},
  doi = {http://dx.doi.org/10.1016/j.future.2011.08.004}
}

Created by JabRef export filters on 03/05/2024 by the social publication management platform PUMA