This page provides a large hyperlink graph for public download. The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, this graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. Below we provide instructions on how to download the graph as well as basic statistics about its topology.
T. Tran, N. Tran, A. Teka Hadgu, und R. Jäschke. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, (September 2015)
J. Teevan, S. Dumais, D. Liebling, und R. Hughes. UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technology, Seite 237--246. New York, NY, USA, ACM, (2009)
B. Berendt, A. Hotho, und G. Stumme. Web Semantics: Science, Services and Agents on the World Wide Web8 (2-3):
95 - 96(2010)Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences.
B. Krause, R. Jäschke, A. Hotho, und G. Stumme. HT '08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, Seite 157--166. New York, NY, USA, ACM, (2008)