Researchers at Google annotated English-language Web pages from the ClueWeb09 and ClueWeb12 corpora. The annotation process was automatic, and hence imperfect. However, the annotations are of generally high quality, as they strove for high precision (and, by necessity, lower recall). For each entity they recognized with high confidence, they provide the beginning and end byte offsets of the entity mention in the input text, its Freebase identifier (mid), and two confidence levels (computed differently, see below).
You might consider using this data in conjunction with the recently released Freebase annotations of several TREC query sets.
J. Tang, M. Hong, J. Li, and B. Liang. International Semantic Web Conference, volume 4273 of Lecture Notes in Computer Science, page 640-653. Springer, (2006)
L. von Ahn, and L. Dabbish. CHI '04: Proceedings of the SIGCHI conference on Human factors in computing systems, page 319--326. New York, NY, USA, ACM, (2004)
P. Chirita, S. Costache, W. Nejdl, and S. Handschuh. WWW '07: Proceedings of the 16th International Conference on World Wide Web, page 845--854. New York, NY, USA, ACM, (2007)
R. Yan, A. Natsev, and M. Campbell. MS '07: Workshop on multimedia information retrieval on The many faces of multimedia semantics, page 13--20. New York, NY, USA, ACM, (2007)
R. Jesus, D. Goncalves, A. Abrantes, and N. Correia. Computer Vision and Pattern Recognition Workshops, 2008. CVPR Workshops 2008. IEEE Computer Society Conference on(June 2008)