
1. The Description Logic Handbook: Theory, Implementation and Applications. 2nd Edition Cambridge University Press 2007.

2. Aberer Karl, Cudré-Mauroux Philippe, Hauswirth Manfred. The Chatty Web: Emergent semantics through gossiping. In: 12th World Wide Web Conference. 2003.

3. Abiteboul Serge, Benjelloun Omar, Cauytis Bogdan, Manolescu Ioana, Milo Tova, Preda Nicoleta. Lazy query evaluation for Active XML. In: SIGMOD. June 2004.

4. Abiteboul Serge, Benjelloun Omar, Milo Tova. The Active XML project: An overview. VLDB J. 2008;17.

5. Abiteboul Serge, Bonifati Angela, Cobena Gregory, Manolescu Ioana, Milo Tova. Dynamic XML documents with distribution and replication. In: SIGMOD. June 2003.

6. Abiteboul Serge, Duschka Oliver. Complexity of answering queries using materialized views. In: PODS. 1998.

7. Abiteboul Serge, Hull Richard, Vianu Victor. Foundations of Databases Addison-Wesley 1995.

8. Abouzeid Azza, Bajda-Pawlikowski Kamil, Abadi Daniel J, Rasin Alexander, Silberschatz Avi. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB. 2009;2.

9. Adelberg B. Nodose – A tool for semi-automatically extracting semi-structured data from text documents. In: SIGMOD. 1998.

10. Adjiman P, Chatalic Philippe, Goasdoué François, Rousset Marie-Christine, Simon Laurent. Distributed reasoning in a peer-to-peer setting. In: ECAI. 2004:945–946.

11. Adjiman Philippe, Goasdoué François, Rousset Marie-Christine. SomerDFsin the semantic web. J Data Semantics. 2007;8:158–181.

12. Afrati Foto, Li Chen, Mitra Prasenjit. Rewriting queries using views in the presence of arithmetic comparisons. Theoretical Computer Science. 2006;368(1-2):88–123.

13. Afrati Foto, Li Chen, Ullman Jeffrey. Generating efficient plans for queries using views. In: Proceedings of the ACM SIGMOD Conference. 2001:319–330.

14. Afrati Foto N, Chirkova Rada. Selecting and using views to compute aggregate queries. J Comput Syst Sci. 2011;77.

15. Afrati Foto N, Kolaitis Phokion G. Answering aggregate queries in data exchange. In: PODS. 2008.

16. Afrati Foto N, Li Chen, Mitra Prasenjit. On containment of conjunctive queries with arithmetic comparisons. In: EDBT. 2004:459–476.

17. Aggrawal Charu, ed. Managing and Mining Uncertain Data Kluwer Academic Publishers 2009.

18. Agrawal Sanjay, Chaudhuri Surajit, Das Gautam. DBXplorer: A system for keyword-based search over relational databases. In: ICDE. 2002.

19. Ahmed Rafi, Smedt Phillippe De, Du Weimin, et al. The Pegasus heterogeneous multidatabase system. IEEE Computer December 1991:19–26.

20. Al-Khalifa Shurug, Jagadish HV, Koudas Nick, Srivastava Divesh, Wu Yuqing. Structural joins: A primitive for efficient XML query pattern matching. In: ICDE. 2002.

21. Alexe Bogdan, Chiticariu Laura, Miller Renée J, Tan Wang Chiew. Muse: Mapping understanding and design by example. In: ICDE. 2008:10–19.

22. Alexe Bogdan, Hernández Mauricio, Popa Lucian, Tan Wang-Chiew. Mapmerge: Correlating independent schema mappings. Proc VLDB Endow. September 2010;3.

23. Alexe Bogdan, Kolaitis Phokion G, Tan Wang Chiew. Characterizing schema mappings via data examples. In: PODS. 2010:261–272.

24. Alexe Bogdan, Cate Balder ten, Kolaitis Phokion G, Tan Wang Chiew. Designing and refining schema mappings via data examples. In: SIGMOD Conference. 2011:133–144.

25. Algergawy Alsayed, Massmann Sabine, Rahm Erhard. A clustering-based approach for large-scale ontology matching. In: ADBIS. 2011:415–428.

26. Altinel Mehmet, Franklin Michael J. Efficient filtering of XML documents for selective dissemination of information. In: VLDB. 2000.

27. Amsterdamer Yael, Davidson Susan B, Deutch Daniel, Milo Tova, Stoyanovich Julia, Tannen Val. Putting lipstick on pig: Enabling database-style workflow provenance. PVLDB. 2011;5.

28. Amsterdamer Yael, Deutch Daniel, Milo Tova, Tannen Val. On provenance minimization. In: PODS. 2011.

29. Amsterdamer Yael, Deutch Daniel, Tannen Val. Provenance for aggregate queries. In: PODS. 2011.

30. An Yuan, Borgida Alexander, Miller Renée J, Mylopoulos John. A semantic approach to discovering schema mapping expressions. In: ICDE. 2007:206–215.

31. Ananthakrishna Rohit, Chaudhuri Surajit, Ganti Venkatesh. Eliminating fuzzy duplicates in data warehouses. In: VLDB. 2002.

32. Antova Lyublena, Koch Christoph, Olteanu Dan. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In: ICDE. 2007.

33. Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: SIGMOD. 2003.

34. Arasu Arvind, Ganti Venkatesh, Kaushik Raghav. Efficient exact set-similarity joins. In: VLDB. 2006:918–929.

35. Arasu Arvind, Götz Michaela, Kaushik Raghav. On active learning of record matching packages. In: SIGMOD Conference. 2010:783–794.

36. Marcelo Arenas, Pablo Barceló, Leonid Libkin, and Filip Murlak. Relational and XML Data Exchange. Synthesis Lectures on Data Management. 2011.

37. Arens Yigal, Chee Chin Y, Hsu Chun-Nan, Knoblock Craig A. Retrieving and integrating data from multiple information sources. International Journal on Intelligent and Cooperative Information Systems 1994.

38. Arens Yigal, Knoblock Craig A, Shen Wei-Min. Query reformulation for dynamic information integration. International Journal on Intelligent and Cooperative Information Systems. June 1996;6.

39. Arocena Gustavo, Mendelzon Alberto. WebOQL: Restructuring documents, databases and webs. In: Proceedings of the International Conference on Data Engineering (ICDE). 1998.

40. Atzeni P, Torlone R. Management of multiple models in an extensible database design tool. In: Proc EDBT. 1996:79–95.

41. Atzeni Paolo, Mecca Giansalvatore, Merialdo Paolo. To weave the web. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1997.

42. Auer Sören, Bizer Christian, Kobilarov Georgi, Lehmann Jens, Cyganiak Richard, Ives Zachary G. Dbpedia: A nucleus for a web of open data. In: ISWC/ASWC. 2007.

43. Aumueller David, Do Hong Hai, Massmann Sabine, Rahm Erhard. Schema and ontology matching with COMA++. In: SIGMOD Conference. 2005:906–908.

44. Avnur Ron, Hellerstein Joseph M. Eddies: Continuously adaptive query processing. In: SIGMOD. 2000.

45. Babcock Brian, Babu Shivnath, Datar Mayur, Motwani Rajeev. Chain: Operator scheduling for memory minimization in data stream systems. In: SIGMOD. 2003.

46. Babcock Brian, Chaudhuri Surajit. Towards a robust query optimizer: A principled and practical approach. In: SIGMOD. 2005.

47. Babcock Brian, Datar Mayur, Motwani Rajeev. Load shedding for aggregation queries over data streams. In: ICDE. 2004.

48. Babu Shivnath, Bizarro Pedro. Adaptive query processing in the looking glass. In: CIDR. 2005.

49. Babu Shivnath, Motwani Rajeev, Munagala Kamesh, Nishizawa Itaru, Widom Jennifer. Adaptive ordering of pipelined stream filters. In: SIGMOD. 2004.

50. Shivnath Babu, Utkarsh Srivastava, and Jennifer Widom. Exploiting k-constraints to reduce memory overhead in continuous queries over streams. Technical Report, Stanford University, 2002.

51. Baid Akanksha, Rae Ian, Doan AnHai, Naughton Jeffrey F. Toward industrial-strength keyword search systems over relational data. In: ICDE. 2010.

52. Baid Akanksha, Rae Ian, Li Jiexing, Doan AnHai, Naughton Jeffrey F. Toward scalable keyword search over relational data. PVLDB. 2010;3(1):140–149.

53. Bancilhon François, Spyratos Nicolas. Update semantics of relational views. TODS. 1981;6.

54. Banko Michele, Cafarella Michael J, Soderland Stephen, Broadhead Matthew, Etzioni Oren. Open information extraction from the web. In: Veloso Manuela M, ed. IJCAI. 2007:2670–2676.

55. Barbosa Luciano, Freire Juliana. Siphoning hidden-web data through keyword-based interfaces. In: SBBD. 2004.

56. Baumgartner Robert, Flesca Sergio, Gottlob Georg. Declarative information extraction, Web crawling, and recursive wrapping with Lixto. In: Proc of the 6th Int Conf on Logic Programming and Nonmonotonic Reasoning. 2001.

57. Baumgartner Robert, Flesca Sergio, Gottlob Georg. Visual Web information extraction with Lixto. In: VLDB. 2001.

58. Bavoil Louis, Callahan Steven P, Crossno Patricia J, et al. VisTrails: Enabling interactive multiple-view visualizations. IEEE Visualization 2005.

59. Bayardo Roberto J, Ma Yiming, Srikant Ramakrishnan. Scaling up all pairs similarity search. In: WWW. 2007:131–140.

60. Beeri Catriel, Levy Alon Y, Rousset Marie-Christine. Rewriting queries using views in description logics. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1997:99–108.

61. Bellahsene Z, Bonifati A, Rahm E, eds. Schema Matching and Mapping Springer 2011.

62. Bellahsene Zohra, Duchateau Fabien. Tuning for schema matching. In: Schema Matching and Mapping. 2011:293–316.

63. Bello Randall, Dias Karl, Downing Alan, et al. Materialized views in Oracle. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1998:659–664.

64. Benedikt Michael, Gottlob Georg. The impact of virtual views on containment. PVLDB. 2010;3(1):297–308.

65. Benjelloun Omar, Garcia-Molina Hector, Gong Heng, et al. D-swoosh: A family of algorithms for generic, distributed entity resolution. In: ICDCS. 2007:37.

66. Benjelloun Omar, Garcia-Molina Hector, Menestrina David, Su Qi, Whang Steven Euijong, Widom Jennifer. Swoosh: A generic approach to entity resolution. VLDB J. 2009;18(1):255–276.

67. Benjelloun Omar, Das Sarma Anish, Halevy Alon Y, Widom Jennifer. ULDBs: Databases with uncertainty and lineage. In: VLDB. 2006.

68. Bergamaschi Sonia, Castano Silvana, Vincini Maurizio. Semantic integration of semistructured and structured data sources. SIGMOD Record. 1999;28(1):54–59.

69. Bergamaschi Sonia, Domnori Elton, Guerra Francesco, Lado Raquel Trillo, Velegrakis Yannis. Keyword search over relational databases: A metadata approach. In: Proceedings of the 2011 International Conference on Management of Data. New York, NY, USA: SIGMOD ’11; 2011; Available from; 2011.

70. Bergman Michael K. The deep web: Surfacing hidden value. Journal of Electronic Publishing 2001.

71. Berners-Lee Tim, Hendler James, Lassila Ora. The semantic web. Scientific American May 2001.

72. Bernstein Philip A. Applying model management to classical meta-data problems. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR). 2003.

73. Bernstein Philip A, Chiu Dah-Ming W. Using semi-joins to solve relational queries. J ACM. 1981;28.

74. Bernstein Philip A, Giunchiglia Fausto, Kementsietsidis Anastasios, Mylopoulos John, Serafini Luciano, Zaihrayeu Ilya. Data management for peer-to-peer computing: A vision. In: Proceedings of the WebDB Workshop. 2002.

75. Bernstein Philip A, Green Todd J, Melnik Sergey, Nash Alan. Implementing mapping composition. In: Proc of VLDB. 2006:55–66.

76. Bernstein Philip A, Halevy Alon Y, Pottinger Rachel. A vision for management of complex models. SIGMOD Record. 2000;29(4):55–63.

77. Bernstein Philip A, Melnik Sergey. Model management 2.0: Manipulating richer mappings. In: Proc of SIGMOD. 2007:1–12.

78. Bernstein Philip A, Melnik Sergey, Churchill John E. Incremental schema matching. In: VLDB. 2006:1167–1170.

79. Berti-Equille Laure, Sarma Anish Das, Dong Xin, Marian Amélie, Srivastava Divesh. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In: CIDR. 2009.

80. Bhalotia Gaurav, Hulgeri Arvind, Nakhe Charuta, Chakrabarti Soumen, Sudarshan S. Keyword searching and browsing in databases using BANKS. In: ICDE. 2002.

81. Bhattacharya I, Getoor L. A latent Dirichlet model for unsupervised entity resolution. In: Proc of the SIAM Int Conf on Data Mining (SDM). 2006.

82. Bhattacharya I, Getoor L. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data. 2007;1.

83. Bhattacharya Indrajit, Getoor Lise, Licamele Louis. Query-time entity resolution. In: KDD. 2006:529–534.

84. Bilenko M. Learnable similarity functions and their applications to clustering and record linkage. In: AAAI. 2004:981–982.

85. Bilenko M, Mooney RJ. Adaptive duplicate detection using learnable string similarity measures. In: Proc of the ACM Int Conf on Knowledge Discovery and Data Mining (KDD). 2003:39–48.

86. Bilenko M, Mooney RJ, Cohen WW, Ravikumar PD, Fienberg SE. Adaptive name matching in information integration. IEEE Intelligent Systems. 2003;18(5):16–23.

87. Biton Olivier, Boulakia Sarah Cohen, Davidson Susan B, Hara Carmem S. Querying and managing provenance through user views in scientific workflows. In: ICDE. 2008.

88. Bizer Christian, Heath Tom, Berners-Lee Tim. Linked data – The story so far. Int J Semantic Web Inf Syst. 2009;5(3):1–22.

89. Blakeley José A, Coburn Neil, Larson Per-Åke. Updating derived relations: Detecting irrelevant and autonomously computable updates. TODS. 1989;14.

90. Bloom Burton H. Space/time trade-offs in hash coding with allowable errors. CACM. July 1970;13.

91. Blunschi Lukas, Dittrich Jens-Peter, Girard Olivier, Karakashian Shant Krakos, Vas Salles Marcos Antonio. The iMemex personal dataspace management system (demo). In: CIDR. 2007.

92. Bohannon Philip, Elnahrawy Eiman, Fan Wenfei, Flaster Michael. Putting context into schema matching. In: VLDB. 2006:307–318.

93. Borgida Alex. Description logics in data management. IEEE Trans on Know and Data Engineering. 1995;7(5):671–682.

94. Borkar VR, Deshmukh K, Sarawagi S. Automatic segmentation of text into structured records. In: SIGMOD. 2001.

95. Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. MYSTIQ: A system for finding more answers by using probabilities. In: Proc of ACM SIGMOD. 2005.

96. Brundage Michael. XQuery: The XML Query Language February 2004.

97. Bruno Nicolas, Koudas Nick, Srivastava Divesh. Holistic twig joins: Optimal xml pattern matching. In: SIGMOD Conference. 2002.

98. Buneman Peter, Khanna Sanjeev, Tan Wang Chiew. Why and where: A characterization of data provenance. In: ICDT. 2001.

99. Buneman Peter, Kosky Anthony, Davidson Susan. Theoretical aspects of schema merging. In: Proc of EDBT. 1992:152–167.

100. Burdick Douglas, Hernández Mauricio A, Ho Howard, et al. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng Bull. 2011;34(3):60–67.

101. Cafarella Michael J, Halevy Alon, Zhang Yang, Wang Daisy Zhe, Wu Eugene. Uncovering the Relational Web. In: WebDB. 2008.

102. Cafarella Michael J, Halevy Alon, Zhang Yang, Wang Daisy Zhe, Wu Eugene. WebTables: Exploring the power of tables on the web. In: VLDB. 2008.

103. Cai D, Yu S, Wen J, Ma W. Extracting content structure for Web pages based on visual representation. In: Proc of the 5th Asian-Pacific Web Conference (APWeb). 2003.

104. Cali Andrea, Calvanese Diego, DeGiacomo Giuseppe, Lenzerini Maurizio. Data integration under integrity constraints. In: Proceedings of CAiSE. 2002:262–279.

105. Califf ME, Mooney RJ. Relational learning of pattern-match rules for information extraction. In: AAAI. 1999.

106. Callan James P, Connell Margaret E. Query-based sampling of text databases. ACM Transactions on Information Systems. 2001;19(2):97–130.

107. Calvanese D, De Giacomo G, Lenzerini M. Answering queries using views over description logics. In: Proceedings of AAAI. 2000:386–391.

108. Calvanese D, De Giacomo G, Lenzerini M, Vardi M. View-based query processing for regular path queries with inverse. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 2000:58–66.

109. Carey Michael J, Kossmann Donald. On saying “enough already!” in SQL. In: SIGMOD. 1997.

110. Carlson Andrew, Betteridge Justin, Kisiel Bryan, Settles Burr, Hruschka Jr Estevam R, Mitchell Tom M. Toward an architecture for never-ending language learning. In: Fox Maria, Poole David, eds. AAAI. AAAI Press 2010.

111. Castano S, De Antonellis V. A discovery-based approach to database ontology design. Distributed and Parallel Databases – Special Issue on Ontologies and Databases. 1999;7.

112. Catarci T, Lenzerini M. Representing and using interschema knowledge in cooperative information systems. Journal of Intelligent and Cooperative Information Systems 1993:55–62.

113. Chai Xiaoyong, Vuong Ba-Quy, Doan AnHai, Naughton Jeffrey F. Efficiently incorporating user feedback into information extraction and integration programs. In: SIGMOD. 2009.

114. Chandra AK, Merlin PM. Optimal implementation of conjunctive queries in relational databases. In: Proceedings of the Ninth Annual ACM Symposium on Theory of Computing. 1977:77–90.

115. Chang C, Kayed M, Girgis MR, Shaalan KF. A survey of web information extraction systems. IEEE Trans Knowl Data Eng. 2006;18(10):1411–1428.

116. Chang C, Lui S. IEPAD: Information extraction based on pattern discovery. In: WWW. 2001.

117. Chapman Adriane, Jagadish HV. Why not?. In: SIGMOD Conference. 2009.

118. Sam Chapman. Sam’s string metrics. 2006. Available at[email protected]/stringmetrics.html.

119. Chatterjee A, Segev A. Data manipulation in heterogeneous databases. SIGMOD Record. 1991;20(4):64–68.

120. Chaudhuri S, Ganjam K, Ganti V, Motwani R. Robust and efficient fuzzy match for online data cleaning. In: SIGMOD. 2003.

121. Chaudhuri Surajit. An overview of query optimization in relational systems. In: PODS. 1998.

122. Chaudhuri Surajit, Dayal Umeshwar. An overview of data warehousing and olap technology. SIGMOD Record. 1997;26.

123. Chaudhuri Surajit, Dayal Umeshwar, Narasayya Vivek R. An overview of business intelligence technology. Commun ACM. 2011;54(8):88–98.

124. Chaudhuri Surajit, Ganti Venkatesh, Kaushik Raghav. A primitive operator for similarity joins in data cleaning. In: ICDE. 2006:5.

125. Chaudhuri Surajit, Krishnamurthy Ravi, Potamianos Spyros, Shim Kyuseok. Optimizing queries with materialized views. In: Proceedings of the International Conference on Data Engineering (ICDE). 1995:190–200.

126. Chaudhuri Surajit, Sarma Anish Das, Ganti Venkatesh, Kaushik Raghav. Leveraging aggregate constraints for deduplication. In: SIGMOD Conference. 2007:437–448.

127. Chaudhuri Surajit, Vardi Moshe. Optimizing real conjunctive queries. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1993:59–70.

128. Chawathe Sudarshan, Garcia-Molina Hector, Hammer Joachim, et al. The TSIMMIS project: Integration of heterogeneous information sources. In: Proceedings of IPSJ. October 1994.

129. Chawathe Sudarshan S, Garcia-Molina Hector. Meaningful change detection in structured data. In: SIGMOD. 1997:26–37.

130. Chekuri Chandra, Rajaraman Anand. Conjunctive query containment revisited. Theor Comput Sci. 2000;239(2):211–229.

131. Chen Yi, Davidson Susan B, Zheng Yifeng. Vitex: A streaming xpath processing system. In: ICDE. 2005.

132. Cheney James, Acar Umut A, Ahmed Amal. Provenance traces. CoRR 2008; abs/0812.0564.

133. Cheney James, Chiticariu Laura, Tan Wang Chiew. Provenance in databases: Why, how, and where. Foundations and Trends in Databases. 2009;1.

134. Chirkova Rada, Halevy Alon Y, Suciu Dan. A formal perspective on the view selection problem. VLDB J. 2002;11.

135. Chiticariu L, Kolaitis PG, Popa L. Interactive generation of integrated schemas. In: Proc of SIGMOD. 2008.

136. Chiticariu Laura, Tan Wang Chiew, Vijayvargiya Gaurav. Dbnotes: A post-it system for relational databases based on provenance. In: SIGMOD. 2005.

137. Chu Eric, Baid Akanksha, Chai Xiaoyong, Doan AnHai, Naughton Jeffrey F. Combining keyword search and forms for ad hoc querying of databases. In: SIGMOD Conference. 2009.

138. Chu Francis C, Halpern Joseph Y, Gehrke Johannes. Least expected cost query optimization: What can we expect?. In: PODS. 2002.

139. Chuang S, Chang KC, Zhai C. Collaborative wrapping: A turbo framework for Web data extraction. In: ICDE. 2007.

140. Clifton Chris, Housman E, Rosenthal Arnon. Experience with a combined approach to attribute-matching across heterogeneous databases. In: DS-7. 1997.

141. Cochinwala M, Kurien V, Lalk G, Shasha D. Efficient data reconciliation. Inf Sci. 2001;137(1-4):1–15.

142. Cohen Sara. Containment of aggregate queries. SIGMOD Record. 2005;34(1):77–85.

143. W. Cohen. A mini-course on record linkage and matching, 2004.

144. Cohen W, Richman J. Learning to match and cluster large high-dimensional data sets for data integration. In: Proc of the ACM Int Conf on Knowledge Discovery and Data Mining (KDD). 2002:475–480.

145. Cohen WW. Data integration using similarity joins and a word-based information representation language. ACM Trans Inf Syst. 2000;18(3):288–321.

146. W. W. Cohen. Record linkage tutorial: Distance metrics for text. 2001. PPT slides, available at

147. Cohen WW, Hurst M, Jensen LS. A flexible learning system for wrapping tables and lists in HTML documents. In: WWW. 2002.

148. Cohen WW, Ravikumar PD, Fienberg SE. A comparison of string distance metrics for name-matching tasks. In: IIWeb. 2003.

149. Colby Latha S, Griffin Timothy, Libkin Leonid, Mumick Inderpal Singh, Trickey Howard. Algorithms for deferred view maintenance. In: SIGMOD. 1996.

150. Cole Richard L, Graefe Goetz. Optimization of dynamic query evaluation plans. In: SIGMOD. 1994.

151. Craven Mark, DiPasquo Dan, Freitag Dayne, et al. Learning to extract symbolic knowledge from the world-wide web. In: Proceedings of the AAAI Fifteenth National Conference on Artificial Intelligence. 1998.

152. Crescenzi V, Mecca G. Grammars have exceptions. Inf Syst. 1998;23(8):539–565.

153. Crescenzi V, Mecca G. Automatic information extraction from large websites. J ACM. 2004;51(5):731–779.

154. Crescenzi V, Mecca G, Merialdo P. Roadrunner: Towards automatic data extraction from large web sites. In: VLDB. 2001.

155. Yingwei Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford Univ., 2001.

156. Culotta A, McCallum A. Joint deduplication of multiple record types in relational data. In: Proc of the ACM Int Conf on Information and Knowledge Management (CIKM). 2005:257–258.

157. Curino Carlo, Moon Hyun Jin, Deutsch Alin, Zaniolo Carlo. Update rewriting and integrity constraint maintenance in a schema evolution support system: Prism++. PVLDB. 2010;4.

158. Dalvi N, Kumar R, Soliman MA. Automatic wrappers for large scale web extraction. PVLDB. 2011;4(4):219–230.

159. Dalvi NN, Bohannon P, Sha F. Robust Web extraction: An approach based on a probabilistic tree-edit model. In: SIGMOD. 2009.

160. Dalvi Nilesh, Suciu Dan. Efficient query evaluation on probabilistic databases. In: VLDB. 2004.

161. Dean Daniels. Query compilation in a distributed database system. Technical Report RJ 3423, IBM, 1982.

162. Das Sarma A, Dong L, Halevy A. Bootstrapping pay-as-you-go data integration systems. In: Proc of SIGMOD. 2008.

163. Dasgupta Arjun, Jin Xin, Jewell Bradley, Zhang Nan, Das Gautam. Unbiased estimation of size and other aggregates over hidden web databases. In: SIGMOD Conference. 2010:855–866.

164. Davidson Susan B, Khanna Sanjeev, Milo Tova, Panigrahi Debmalya, Roy Sudeepa. Provenance views for module privacy. In: PODS. 2011.

165. Dayal U. Processing queries over generalized hierarchies in a multidatabase systems. In: Proc of the VLDB Conf. 1983:342–353.

166. Dayal Umeshwar, Bernstein Philip A. On the correct translation of update operations on relational views. TODS. 1982;7.

167. de S Mesquita Filipe, da Silva Altigran Soares, de Moura Edleno Silva, Calado Pvel, H.F. Alberto. Laender Labrador: Efficiently publishing relational databases on the web by using keyword-based query interfaces. Inf Process Manage. 2007;43(4):983–1004.

168. DeMichiel LG. Resolving database incompatibility: An approach to performing relational operations over mismatched domains. IEEE Transactions on Knowledge and Data Engineering 1989.

169. DeRose Pedro, Chai Xiaoyong, Gao Byron J, et al. Building community wikipedias: A machine-human partnership approach. In: ICDE. 2008:646–655.

170. DeRose Pedro, Shen Warren, Chen Fei, Doan AnHai, Ramakrishnan Raghu. Building structured web community portals: A top-down, compositional, and incremental approach. In: Proc of VLDB. 2007.

171. Deshpande Amol, Hellerstein Joseph M. Lifting the burden of history from adaptive query processing. In: VLDB. 2004.

172. Deshpande Amol, Ives Zachary, Raman Vijayshankar. Adaptive query processing. Foundations and Trends in Databases 2007.

173. Deutsch Alin, Fernández Mary F, Florescu Daniela, Levy Alon Y, Suciu Dan. XML-QL. In: QL. 1998.

174. Deutsch Alin, Tannen Val. MARS: A system for publishing XML from mixed and redundant storage. In: VLDB. 2003.

175. Dey D. Entity matching in heterogeneous databases: A logistic regression approach. Decision Support Systems. 2008;44(3):740–747.

176. Dey D, Sarkar S, De P. A distance-based approach to entity reconciliation in heterogeneous databases. IEEE Trans Knowl Data Eng. 2002;14(3):567–582.

177. Diao Yanlei, Fischer Peter M, Franklin Michael J, To Raymond. YFilter: Efficient and scalable filtering of XML documents. In: ICDE. 2002.

178. Do Hong Hai, Rahm Erhard. COMA – A system for flexible combination of schema matching approaches. In: VLDB. 2002.

179. Doan A, Halevy AY. Semantic integration research in the database community: A brief survey. AI Magazine. 2005;26(1):83–94.

180. Doan A, Noy NF, Halevy AY. Introduction to the special issue on semantic integration. SIGMOD Record. 2004;33(4):11–13.

181. Doan AnHai, Domingos Pedro, Halevy Alon Y. Reconciling schemas of disparate data sources: A machine learning approach. In: Proceedings of the ACM SIGMOD Conference. 2001.

182. Doan AnHai, Lu Ying, Lee Yoonkyong, Han Jiawei. Profile-based object matching for information integration. IEEE Intelligent Systems. 2003;18(5):54–59.

183. Doan Anhai, Madhavan Jayant, Domingos Pedro, Halevy Alon. Learning to map between ontologies on the semantic web. In: 11th World Wide Web Conference. 2002.

184. Doan AnHai, Naughton Jeffrey F, Ramakrishnan Raghu, et al. Information extraction challenges in managing unstructured data. SIGMOD Record December 2008.

185. Doan AnHai, Ramakrishnan Raghu, Chen Fei, et al. Community information management. IEEE Data Eng Bull. 2006;29(1):64–72.

186. Doan AnHai, Ramakrishnan Raghu, Halevy Alon Y. Crowdsourcing systems on the world-wide web. Commun ACM. 2011;54(4):86–96.

187. Domingos Pedro, Pazzani Micheal. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning. 1997;29:103–130.

188. Dong X, Halevy AY, Madhavan J. Reference reconciliation in complex information spaces. In: Proc of the SIGMOD Conf. 2005:85–96.

189. Dong X, Halevy AY, Yu C. Data integration with uncertainty. In: Proc of VLDB. 2007.

190. Dong Xin, Berti-Equille Laure, Hu Yifan, Srivastava Divesh. Global detection of complex copying relationships between sources. PVLDB. 2010;3(1):1358–1369.

191. Dong Xin, Halevy Alon. A platform for personal information management and integration. In: Proc of CIDR. 2005.

192. Donini Francesco M, Lenzerini Maurizio, Nardi Daniele, Nutt Werner. The complexity of concept languages. In: Proceedings of KR-91. 1991.

193. Donini Francesco M, Lenzerini Maurizio, Nardi Daniele, Schaerf Andrea. A hybrid system with datalog and concept languages. In: Ardizzone E, Gaglio S, Sorbello F, eds. Springer Verlag 1991:88–97. Trends in Artificial Intelligence. volume LNAI 549.

194. Dontcheva Mira, Drucker Steven M, Salesin David, Cohen Michael F. Relations, cards, and search templates: User-guided web data integration and layout. In: UIST. 2007:61–70.

195. Doorenbos RB, Etzioni O, Weld DS. A scalable comparison-shopping agent for the World-Wide Web. In: Agents. 1997.

196. Durbin R, Eddy S, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids Cambridge University Press 1999.

197. Duschka Oliver, Genesereth Michael, Levy Alon. Recursive query plans for data integration. Journal of Logic Programming, special issue on Logic Based Heterogeneous Information Systems. 2000;43.

198. Duschka Oliver M, Genesereth Michael R. Answering recursive queries using views. In: PODS. 1997.

199. Duschka Oliver M, Genesereth Michael R. Query planning in infomaster. In: Proceedings of the ACM Symposium on Applied Computing. 1997:109–111.

200. Duschka Oliver M, Levy Alon Y. Recursive plans for information gathering. In: Proc of the 15th Int Joint Conf on Artificial Intelligence (IJCAI). 1997:778–784.

201. Ehrig Marc, Staab Steffen, Sure York. Bootstrapping ontology alignment methods with APFEL. In: International Semantic Web Conference. 2005:186–200.

202. Elkan Charles. A decision procedure for conjunctive query disjointness. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1989.

203. Elkan Charles. Independence of logic database queries and updates. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1990:154–160.

204. Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: A survey. IEEE Trans Knowl Data Eng. 2007;19(1):1–16.

205. Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering. 2007;19(1):1–16.

206. Elmeleegy Hazem, Elmagarmid Ahmed K, Lee Jaewoo. Leveraging query logs for schema mapping generation in u-map. In: SIGMOD Conference. 2011:121–132.

207. Elmeleegy Hazem, Madhavan Jayant, Halevy Alon Y. Harvesting relational tables from lists on the web. PVLDB. 2009;2(1):1078–1089.

208. Elmeleegy Hazem, Ouzzani Mourad, Elmagarmid Ahmed K. Usage-based schema matching. In: ICDE. 2008:20–29.

209. Eltabakh Mohamed Y, Aref Walid G, Elmagarmid Ahmed K, Ouzzani Mourad, Silva Yasin N. Supporting annotations on relations. In: EDBT. 2009.

210. Embley DW, Campbell DM, Jiang YS, et al. Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng. 1999;31(3):227–251.

211. Embley DW, Jiang YS, Ng Y. Record-boundary discovery in web documents. In: SIGMOD. 1999.

212. Embley David W, Jackman David, Xu Li. Multifaceted exploitation of metadata for attribute match discovery in information integration. In: Workshop on Information Integration on the Web. 2001:110–117.

213. Etzioni O, Golden K, Weld D. Sound and efficient closed-world reasoning for planning. Artificial Intelligence. January 1997;89(1–2):113–148.

214. Euzenat Jérôme, Shvaiko Pavel. Ontology Matching Springer 2007.

215. Fagin Ronald. Inverting schema mappings. In: Proc of PODS. 2006:50–59.

216. Fagin Ronald, Kolaitis Phokion, Miller Renée J, Popa Lucian. Data exchange: Semantics and query answering. TCS. 2005;336:89–124.

217. Fagin Ronald, Kolaitis Phokion, Popa Lucian. Data exchange: Getting to the core. ACM Transactions on Database Systems. 2005;30(1):174–210.

218. Fagin Ronald, Kolaitis Phokion G, Popa Lucian. Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems. 2005;30(4):994–1055.

219. Fagin Ronald, Kolaitis Phokion G, Popa Lucian, Tan Wang-Chiew. Quasi-inverses of schema mappings. In: Proc of PODS. 2007:123–132.

220. Fagin Ronald, Kolaitis Phokion G, Popa Lucian, Tan Wang Chiew. Schema mapping evolution through composition and inversion. In: Schema Matching and Mapping. 2011:191–222.

221. Fagin Ronald, Lotem Amnon, Naor Moni. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences. June 2003;66.

222. Falconer Sean M, Storey Margaret-Anne D. A cognitive support framework for ontology mapping. In: ISWC/ASWC. 2007:114–127.

223. Fan Wenfei, Geerts Floris. Capturing missing tuples and missing values. In: PODS. 2010:169–178.

224. Fellegi IP, Sunter AB. A theory for record linkage. Journal of the American Statistical Society. 1969;64(328):1183–1210.

225. Fernandez Mary, Florescu Daniela, Kang Jaewoo, Levy Alon, Suciu Dan. Catching the boat with Strudel: Experiences with a web-site management system. In: Proceedings of the ACM SIGMOD Conference. 1998.

226. Finger Jonathan, Polyzotis Neoklis. Robust and efficient algorithms for rank join evaluation. In: SIGMOD. 2009.

227. Florescu Daniela, Koller Daphne, Levy Alon. Using probabilistic information in data integration. In: VLDB. 1997.

228. Florescu Daniela, Levy Alon, Manolesu Ioana, Suciu Dan. Query optimization in the presence of limited access patterns. In: Proceedings of the ACM SIGMOD Conference. 1999.

229. Florescu Daniela, Levy Alon, Mendelzon Alberto. Database techniques for the world-wide web: A survey. SIGMOD Record. September 1998;27(3):59–74.

230. Florescu Daniela, Raschid Louiqa, Valduriez Patrick. Using heterogeneous equivalences for query rewriting in multidatabase systems. In: Proceedings of the Int Conf on Cooperative Information Systems (COOPIS). 1995.

231. Nathan Foster J, Green Todd J, Tannen Val. Annotated XML: Queries and provenance. In: PODS. 2008.

232. Francis Paul, Jamin Sugih, Jin Cheng, et al. Idmaps: A global internet host distance estimation service. IEEE/ACM Trans Netw. 2001;9.

233. Franklin Michael, Halevy Alon, Maier David. From databases to dataspaces: A new abstraction for information management. SIGMOD Rec. 2005;34.

234. Friedman M, Weld. D. Efficient execution of information gathering plans. In: Proc of the 15th Int Joint Conf on Artificial Intelligence (IJCAI). 1997.

235. Friedman Marc, Levy Alon, Millstein Todd. Navigational Plans for Data Integration. In: Proceedings of the National Conference on Artificial Intelligence (AAAI). 1999.

236. Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Transactions on Information Systems. 1997;14.

237. Fuxman Ariel, Hernández Mauricio A, Howard Ho CT, Miller Renée J, Papotti Paolo, Popa Lucian. Nested mappings: Schema mapping reloaded. In: VLDB. 2006.

238. Gal A. Why is schema matching tough and what can we do about it?. SIGMOD Record. 2007;35(4):2–5.

239. Gal A, Modica G, Jamil H, Eyal A. Automatic ontology matching using application semantics. AI Magazine. 2005;26(1):21–31.

240. Gal Avigdor. Managing uncertainty in schema matching with top-k schema mappings. Journal of Data Semantics. 2006;VI:90–114.

241. Gal Avigdor. Uncertain Schema Matching Synthesis Lectures on Data Management 2011.

242. Gal Avigdor, Anaby-Tavor Ateret, Trombetta Alberto, Montesi Danilo. A framework for modeling and evaluating automatic semantic reconciliation. In: VLDB J. 2005:50–67.

243. Ganesh M, Srivastava J, Richardson T. Mining entity-identification rules for database integration. In: Proc of the ACM Int Conf on Knowledge Discovery and Data Mining (KDD). 1996:291–294.

244. Garcia-Molina Hector, Papakonstantinou Yannis, Quass Dallan, et al. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems. March 1997;8.

245. Garcia-Molina Hector, Ullman Jeffrey D, Widom Jennifer. Database Systems: The Complete Book Prentice Hall 2002.

246. Gatterbauer Wolfgang, Balazinska Magdalena, Khoussainova Nodira, Suciu Dan. Believe it or not: Adding belief annotations to databases. PVLDB. 2009;2.

247. Gatterbauer Wolfgang, Bohunsky Paul, Herzog Marcus, Krüpl Bernhard, Pollak Bernhard. Towards domain-independent information extraction from web tables. In: WWW. 2007:71–80.

248. L. Getoor and R. Miller. Data and metadata alignment, 2007. Tutorial, the Alberto Mendelzon Workshop on the Foundations of Databases and the Web.

249. Giles CL, Bollacker KD, Lawrence S. CiteSeer: An automatic citation indexing system. In: Proc of the ACM Int Conf on Digital Libraries. 1998:89–98.

250. Gill LE. OX-LINK: The Oxford medical record linkage system. In: Proc of the Int Record Linkage Workshop and Exposition. 1997.

251. Giunchiglia Fausto, Shvaiko Pavel, Yatskevich Mikalai. S-match: An algorithm and an implementation of semantic matching. In: ESWS. 2004:61–75.

252. Glavic Boris, Alonso Gustavo. Perm: Processing provenance and data on the same data model through query rewriting. In: ICDE. 2009.

253. Glavic Boris, Alonso Gustavo. Provenance for nested subqueries. In: EDBT. 2009.

254. Glavic Boris, Alonso Gustavo, Miller Renée J, Haas Laura M. Tramp: Understanding the behavior of schema mappings through provenance. PVLDB. 2010;3.

255. Goasdoué François, Rousset Marie-Christine. Querying distributed data through distributed ontologies: A simple but scalable approach. IEEE Intelligent Systems. 2003;18(5):60–65.

256. Gold EM. Language identification in the limit. Information and Control. 1967;10(5):447–474.

257. Gold EM. Complexity of automaton identification from given data. Information and Control. 1978;37(3):302–320.

258. Goldman Roy, McHugh Jason, Widom Jennifer. From semistructured data to XML: Migrating the Lore data model and query language. In: WebDB ’99. 1999.

259. Goldstein Jonathan, Larson Per-Ake. Optimizing queries using materialized views: A practical, scalable solution. In: Proceedings of the ACM SIGMOD Conference. 2001:331–342.

260. Gonzalez Hector, Halevy Alon Y, Jensen Christian S, et al. Google fusion tables: Data management, integration and collaboration in the cloud. In: SoCC. 2010.

261. Gonzalez Hector, Halevy Alon Y, Jensen Christian S, et al. Google fusion tables: Web-centered data management and collaboration. In: SIGMOD. 2010.

262. Gottlob Georg, Koch Christoph, Baumgartner Robert, Herzog Marcus, Flesca Sergio. The Lixto data extraction project — Back and forth between theory and practice. In: PODS. 2004.

263. Graefe Goetz. Query evaluation techniques for large databases. ACM Computing Surveys. June 1993;25.

264. Gravano L, Ipeirotis PG, Koudas N, Srivastava D. Text joins in an RDBMS for web data integration. In: WWW. 2003.

265. Gravano Luis, Ipeirotis Panagiotis G, Jagadish HV, Koudas Nick, Muthukrishnan S, Srivastava Divesh. Approximate string joins in a database (almost) for free. In: VLDB. 2001:491–500.

266. Gravano Luis, Ipeirotis Panagiotis G, Koudas Nick, Srivastava Divesh. Text joins in an RDBMS for web data integration. In: WWW. 2003.

267. Green Todd J. Containment of conjunctive queries on annotated relations. In: ICDT. 2009.

268. Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Amended version available as Univ. of Pennsylvania report MS-CIS-07-26.

269. Green Todd J, Karvounarakis Grigoris, Tannen Val. Provenance semirings. In: PODS. 2007.

270. Todd J. Green, Gerome Miklau, Makoto Onizuka, and Dan Suciu. Processing XML streams with deterministic automata and stream indexes. Available from, February 2002.

271. Green Todd J, Tannen Val. Models for incomplete and probabilistic information. In: International Workshop on Incompleteness and Inconsistency in databases. March 2006.

272. Grimsmo Nils, Bjørklund Truls A, Hetland Magnus Lie. Fast optimal twig joins. PVLDBJ. September 2010;3.

273. Grumbach S, Mecca G. In search of the lost schema. In: ICDT. 1999.

274. Gulhane Pankaj, Madaan Amit, Mehta Rupesh R, et al. Web-scale information extraction with vertex. In: ICDE. 2011:1209–1220.

275. Gupta Ashish, Mumick Inderpal Singh, Subrahmanian VS. Maintaining views incrementally. In: SIGMOD. 1993.

276. Himanshu Gupta. Selection of views to materialize in a data warehouse. In Database Theory ICDT ’97, volume 1186 of Lecture Notes in Computer Science. 1997. Available from

277. Gupta Himanshu, Mumick Inderpal Singh. Selection of views to materialize under a maintenance cost constraint. In: ICDT. 1999.

278. Gupta Nitin, Kot Lucja, Roy Sudip, Bender Gabriel, Gehrke Johannes, Koch Christoph. Entangled queries: Enabling declarative data-driven coordination. In: SIGMOD Conference. 2011.

279. Gupta Nitin, Nikolic Milos, Roy Sudip, et al. Entangled transactions. PVLDB. 2011;4.

280. Gusfield Dan. Algorithms on Strings, Trees, and Sequences Cambridge University Press 1999.

281. M. Haas Laura, Kossmann Donald, Wimmers Edward L, Yang Jun. Optimizing queries across diverse data sources. In: VLDB. 1997.

282. Jan Hajic, Sandra Carberry, and Stephen Clark, editors. ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden. The Association for Computer Linguistics, 2010.

283. Halevy Alon, Ives Zachary, Madhavan Jayant, Mork Peter, Suciu Dan, Tatarinov Igor. The Piazza peer data management system. TKDE. July 2004;16.

284. Halevy Alon Y. Answering queries using views: A survey. VLDB J. 2001;10.

285. Halevy Alon Y, Ashish Naveen, Bitton Dina, et al. Enterprise information integration: Successes, challenges and controversies. In: SIGMOD Conference. 2005:778–787.

286. Halevy Alon Y, Franklin Michael J, Maier David. Principles of dataspace systems. In: PODS. 2006.

287. Halevy Alon Y, Ives Zachary G, Mork Peter, Tatarinov Igor. Piazza: Data management infrastructure for semantic web applications. In: 12th World Wide Web Conference. May 2003.

288. Halevy Alon Y, Ives Zachary G, Suciu Dan, Tatarinov Igor. Schema mediation in peer data management systems. In: ICDE. March 2003.

289. Hamdi Fayçal, Safar Brigitte, Reynaud Chantal, Zargayouna Haïfa. Alignment-based partitioning of large-scale ontologies. In: EGC (best of volume). 2009:251–269.

290. Hammer J, Garcia-Molina H, Nestorov S, Yerneni R, Breunig MM, Vassalos V. Template-based wrappers in the tsimmis system. In: SIGMOD. 1997.

291. Hammer J, McHugh J, Garcia-Molina H. Semistructured data: The tsimmis experience. In: Proc of the First East-European Symposium on Advances in Databases and Information Systems (ADBIS). 1997.

292. Hammer Joachim, Garcia-Molina Hector, Nestorov Svetlozar, Yerneni Ramana, Breunig Markus M, Vassalos Vasilis. Template-based wrappers in the TSIMMIS system (system demonstration). In: Proceedings of the ACM SIGMOD Conference. 1998.

293. He B, Chang KC. Statistical schema matching across web query interfaces. In: Proc of SIGMOD. 2003.

294. He Bin, Chang Kevin. Automatic complex schema matching across web query interfaces: A correlation mining approach. TODS. 2006;31.

295. He Bin, Chang Kevin Chen-Chuan. Statistical schema integration across the deep web. In: Proc of SIGMOD. 2003.

296. He Bin, Chang Kevin Chen-Chuan, Han Jiawei. Discovering complex matchings across web query interfaces: A correlation mining approach. In: KDD. 2004:148–157.

297. He Bin, Patel Mitesh, Zhang Zeng, Chang Kevin Chen-Chuan. Accessing the deep Web: A survey. Communications of the ACM. 2007;50(5):95–101.

298. He Hao, Wang Haixun, Yang Jun, Yu Philip S. Blinks: Ranked keyword searches on graphs. In: SIGMOD. 2007.

299. Hearst Marti A. Automatic acquisition of hyponyms from large text corpora. In: COLING. 1992:539–545.

300. Hernandez MA, Stolfo SJ. The merge/purge problem for large databases. In: Proc of SIGMOD. 1995.

301. Hernández MA, Stolfo SJ. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery. 1998;2:9–37.

302. Hernandez Mauricio A, Miller Renée J, Haas Laura M. Clio: A semi-automatic tool for schema mapping. In: SIGMOD. 2001.

303. Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. Learning 5000 relational extractors. In Hajic et al. [282], pages 286–295.

304. Hristidis Vagelis, Papakonstantinou Yannis, Balmin Andrey. Keyword proximity search on XML graphs. In: ICDE. 2003.

305. Hsu C, Dung M. Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst. 1998;23(8):521–538.

306. Hu Wei, Qu Yuzhong, Cheng Gong. Matching large ontologies: A divide-and-conquer approach. Data Knowl Eng. 2008;67(1):140–160.

307. Huang Jiansheng, Chen Ting, Doan AnHai, Naughton Jeffrey F. On the provenance of non-answers to queries over extracted data. PVLDB. 2008;1.

308. Huck G, Fankhauser P, Aberer K, Neuhold EJ. Jedi: Extracting and synthesizing information from the Web. In: CoopIS. 1998.

309. Huebsch Ryan, Chun Brent N, Hellerstein Joseph M, et al. The architecture of PIER: An Internet-scale query processor. In: CIDR. 2005.

310. Ilyas Ihab F, Aref Walid G, Elmagarmid Ahmed K. Supporting top-k join queries in relational databases. In: VLDB. 2003.

311. Ilyas Ihab F, Aref Walid G, Elmagarmid Ahmed K, Elmongui Hicham G, Shah Rahul, Vitter Jeffrey Scott. Adaptive rank-aware query optimization in relational databases. ACM Trans Database Syst. 2006;31.

312. Ilyas Ihab F, Soliman Mohamed. Probabilistic Ranking Techniques in Relational Databases Synthesis Lectures on Data Management 2011.

313. Imielinski Tomasz, Lipski Witold. Incomplete information in relational databases. JACM. 1984;31.

314. IBM Inc. IBM AlphaWorks QED Wiki.

315. Microsoft Inc. Popfly., 2008.

316. Yahoo Inc. Pipes.

317. Infochimps: Smart data for apps & analytics., 2011.

318. Ioannidis Yannis E. Query optimization. ACM Comput Surv. 1996;28.

319. Ioannidis Yannis E, Ng Raymond T, Shim Kyusheok, Sellis Timos K. Parametric query optimization. VLDB J. 1997;6.

320. Ioannidis Yannis E, Ramakrishnan Raghu. Containment of conjunctive queries: Beyond relations as sets. ACM Transactions on Database Systems. 1995;20(3):288–324.

321. Ipeirotis Panagiotis G, Gravano Luis. Distributed search over the hidden web: Hierarchical database sampling and selection. In: VLDB. 2002:394–405.

322. Irmak U, Suel T. Interactive wrapper generation with minimal user effort. In: WWW. 2006.

323. Ives Zachary, Florescu Daniela, Friedman Marc, Levy Alon, Weld Dan. An adaptive query execution engine for data integration. In: Proceedings of the ACM SIGMOD Conference. 1999:299–310.

324. Ives Zachary G, Green Todd J, Karvounarakis Grigoris, et al. The orchestra collaborative data sharing system. SIGMOD Rec. 2008.

325. Ives Zachary G, Halevy Alon Y, Weld Daniel S. An XML query engine for network-bound data. VLDB J. December 2002;11.

326. Ives Zachary G, Halevy Alon Y, Weld Daniel S. Adapting to source properties in processing data integration queries. In: SIGMOD. June 2004.

327. Ives Zachary G, Knoblock Craig A, Minton Steven, et al. Interactive data integration though smart copy & paste. In: Proceedings of the Conference on Innovative Data Systems Research (CIDR). 2009.

328. Ives Zachary G, Taylor Nicholas E. Sideways information passing for push query processing. In: ICDE. 2008.

329. Jaccard P. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Socit Vaudoise des Sciences Naturelles. 1901;37:547–579.

330. Jampani Ravi, Xu Fei, Wu Mingxi, Perez Luis Leopoldo, Jermaine Chris, Haas Peter J. The monte carlo database system: Stochastic analysis close to the data. ACM Trans Database Syst. 2011;36.

331. M. A. Jaro. Unimatch: A record linkage system: User’s manual. 1976. Technical Report, U.S. Bureau of the Census, Washington D.C.

332. Jayram TS, Kolaitis Phokion, Vee Erik. The containment problem for real conjunctive queries with inequalities. In: Proc of PODS. 2006:80–89.

333. Jeffery S, Franklin M, Halevy A. Pay-as-you-go user feedback for dataspace systems. In: Proc of SIGMOD. 2008.

334. Jin Wen, Patel Jignesh M. Efficient and generic evaluation of ranked queries. In: SIGMOD Conference. 2011.

335. Josifovski Vanja, Fontoura Marcus, Barta Attila. Querying XML streams. The VLDB Journal. 2005;14.

336. Kabra Navin, DeWitt David J. Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD. 1998.

337. Kacholia Varun, Pandit Shashank, Chakrabarti Soumen, Sudarshan S, Desai Rushi, Karambelkar Hrishikesh. Bidirectional expansion for keyword search on graph databases. In: VLDB. 2005.

338. Kalashnikov DV, Mehrotra S, Chen Z. Exploiting relationships for domain-independent data cleaning. In: Proc of the SDM Conf. 2005.

339. Kang Jaewoo, Naughton Jeffrey F. On schema matching with opaque column names and data values. In: SIGMOD Conference. 2003:205–216.

340. Kanne Carl-Christian, Moerkotte Guido. Efficient storage of XML data. In: ICDE. 2000.

341. Kantere Verena, Manoubi Maher, Kiringa Iluju, Sellis Timos K, Mylopoulos John. Peer coordination through distributed triggers. PVLDB. 2010;3.

342. Karvounarakis Grigoris, Ives Zachary G. Bidirectional mappings for data and update exchange. In: WebDB. 2008.

343. Karvounarakis Grigoris, Ives Zachary G. Querying data provenance. In: SIGMOD. 2010.

344. Kasneci Gjergji, Ramanath Maya, Sozio Mauro, Suchanek Fabian M, Weikum Gerhard. Star: Steiner-tree approximation in relationship graphs. In: ICDE. 2009.

345. Keller Arthur M. Algorithms for translating view updates to database updates for views involving selections, projections, and joins. In: SIGMOD. 1985.

346. Kementsietsidis Anastasios, Arenas Marcelo, Miller Renée J. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In: SIGMOD. June 2003.

347. Klug A. On conjunctive queries containing inequalities. Journal of the ACM. 1988;35(1):146–160.

348. Koller D, Friedman N. Probabilistic Graphical Models The MIT Press 2009.

349. Konstantinidis George, Ambite José Luis. Scalable query rewriting: A graph-based approach. In: SIGMOD Conference. 2011:97–108.

350. Kossmann Donald. The state of the artin distributed query procesing. ACM Computing Surveys. 2000;32.

351. Koudas N. Special issue on data quality. IEEE Data Engineering Bulletin. 2006;29.

352. Koudas N, Marathe A, Srivastava D. Flexible string matching against large databases in practice. In: VLDB. 2004.

353. N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: Similarity measures and algorithms. Tutorial, the ACM SIGMOD Conference, 2006.

354. Koudas Nick, Marathe Amit, Srivastava Divesh. Flexible string matching against large databases in practice. In: VLDB. 2004:1078–1086.

355. Koudas Nick, Srivastava Divesh. Approximate joins: Concepts and techniques. In: VLDB. 2005:1363.

356. Kruger Andries, Giles C Lee, Coetzee Frans, et al. Deadliner: Building a new niche search engine. In: CIKM. 2000:272–281.

357. N. Kushmerick. Wrapper induction for information extraction, 1997. PhD thesis, University of Washington.

358. Kushmerick N. Wrapper induction: Efficiency and expressiveness. Artif Intell. 2000;118(1-2):15–68.

359. Kushmerick N. Wrapper verification. World Wide Web. 2000;3(2):79–94.

360. Kushmerick Nick, Doorenbos Robert, Weld Daniel. Wrapper induction for information extraction. In: IJCAI. 1997.

361. Kwok Chung T, Weld Daniel S. Planning to gather information. In: Proc of the 13th National Conf on Artificial Intelligence (AAAI). 1996:32–39.

362. Laender AHF, Ribeiro-Neto BA, da Silva AS, Teixeira JS. A brief survey of web data extraction tools. SIGMOD Record. 2002;31(2):84–93.

363. Lafferty JD, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc of the Int Conf on Machine Learning (ICML). 2001:282–289.

364. Lakshmanan Laks VS, Leone Nicola, Ross Robert, Subrahmanian VS. Probview: A flexible probabilistic database system. ACM Trans Database Syst. 1997;22.

365. Lambrecht Eric, Kambhampati Subbarao, Gnanaprakasam Senthil. Optimizing recursive information gathering plans. In: Proc of the 16th Int Joint Conf on Artificial Intelligence (IJCAI). 1999:1204–1211.

366. Landers T, Rosenberg R. An overview of multibase. In: Proceedings of the Second International Symoposium on Distributed Databases. 1982:153–183.

367. Lattes Veronique, Rousset Marie-Christine. The use of the CARIN language and algorithms for information integration: The PICSEL project. In: Proceedings of the ECAI-98 Workshop on Intelligent Information Integration. 1998.

368. Lawrence Michael K, Pottinger Rachel, Staub-French Sheryl. Data coordination: Supporting contingent updates. PVLDB. 2011;4.

369. Lee Amy J, Koeller Andreas, Nica Anisoara, Rundensteiner Elke A. Data warehouse evolution: Trade-offs between quality and cost of query rewritings. In: ICDE. 1999.

370. Dongwon Lee. Weighted exact set similarity join. Tutorial Presentation. Available from, 2009.

371. Lee Y, Sayyadian M, Doan A, Rosenthal A. eTuner: Tuning schema matching software using synthetic scenarios. VLDB J. 2007;16(1):97–122.

372. Lee Yoonkyong, Doan AnHai, Dhamankar Robin, Halevy Alon Y, Domingos Pedro. imap: Discovering complex mappings between database schemas. In: Proc of SIGMOD. 2004:383–394.

373. Lenzerini Maurizio. Data integration: A theoretical perspective. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 2002.

374. Lerman K, Getoor L, Minton S, Knoblock CA. Using the structure of Web sites for automatic segmentation of tables. In: SIGMOD. 2004.

375. Lerman K, Minton S, Knoblock CA. Wrapper maintenance: A machine learning approach. J Artif Intell Res (JAIR). 2003;18:149–181.

376. Levenshtein V. Binay code capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR. 1965 1966;163(4 8):845–848 Original in Russian–translation in Soviet Physics Doklady. 1965 1966;10(4 8):707–710.

377. Levy Alon, Rousset Marie-Christine. Combining Horn rules and description logics in CARIN. Artificial Intelligence. September 1998;104:165–209.

378. Levy Alon Y. Obtaining complete answers from incomplete databases. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1996:402–412.

379. Levy Alon Y. Logic-based techniques in data integration. In: Minker Jack, ed. Logic-Based Artificial Intelligence. Dordrecht: Kluwer Academic Publishers; 2000:575–595.

380. Levy Alon Y, Rajaraman Anand, Ordille Joann J. Query answering algorithms for information agents. In: Proc of the 13th National Conf on Artificial Intelligence (AAAI). 1996.

381. Levy Alon Y, Rajaraman Anand, Ordille Joann J. Querying heterogeneous information sources using source descriptions. In: VLDB. 1996.

382. Levy Alon Y, Sagiv Yehoshua. Queries independent of updates. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1993:171–181.

383. Li Chengkai, Chang Kevin Chen-Chuan, Ilyas Ihab F, Song Sumin. RankSQL: Query algebra and optimization for relational top-k queries. In: SIGMOD. 2005.

384. Li W, Clifton C. Semantic integration in heterogeneous databases using neural networks. In: VLDB. 1994:1–12.

385. Li X, Morie P, Roth D. Robust reading: Identification and tracing of ambiguous names. In: Proc of the HLT-NAACL Conf. 2004:17–24.

386. Li X, Morie P, Roth D. Semantic integration in text: From ambiguous names to identifiable entities. AI Magazine. 2005;26(1):45–58 A. Doan and N. Noy and A. Halevy (editors].

387. Li Xian, Lebo Timothy, McGuinness Deborah L. Provenance-based strategies to develop trust in semantic web applications. In: IPAW. 2010.

388. Li Y, Terrell A, Patel JM. WHAM: A high-throughput sequence alignment method. In: SIGMOD Conference. 2011:445–456.

389. Libkin Leonid. Incomplete information and certain answers in general data models. In: PODS. 2011:59–70.

390. Lim EP, Srivastava J, Prabhakar S, Richardson J. Entity identification in database integration. In: Proc of the 5th Int Conf on Data Engineering (ICDE-93). 1993:294–301.

391. Limaye Girija, Sarawagi Sunita, Chakrabarti Soumen. Annotating and searching web tables using entities, types and relationships. PVLDB. 2010;3(1):1338–1347.

392. Liu B. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data Data-Centric Systems and Applications Springer 2007.

393. Liu B, Grossman RL, Zhai Y. Mining data records in Web pages. In: KDD. 2003.

394. Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for Web information sources. In: Proc of the IEEE Intl Conf on Data Engineering (ICDE). 2000.

395. Liu Mengmeng, Taylor Nicholas E, Zhou Wenchao, Ives Zachary G, Loo Boon Thau. Recursive computation of regions and connectivity in networks. In: ICDE. 2009.

396. Liu Xiufeng, Thomsen Christian, Pedersen Torben Bach. Etlmr: A highly scalable dimensional etl framework based on mapreduce. In: Proceedings of the 13th International Conference on Data Warehousing and Knowledge Discovery. 2011.

397. Loo Boon Thau, Hellerstein Joseph M, Stoica Ion, Ramakrishnan Raghu. Declarative routing: Extensible routing with declarative queries. In: SIGCOMM. 2005.

398. Lu James J, Moerkotte Guido, Schue Joachim, Subrahmanian VS. Efficient maintenance of materialized mediated views. In: SIGMOD. 1995.

399. Lu Meiyu, Agrawal Divyakant, Dai Bing Tian, Tung Anthony KH. Schema-as-you-go: On probabilistic tagging and querying of wide tables. In: SIGMOD Conference. 2011:181–192.

400. Ludäscher Bertram, Altintas Ilkay, Berkley Chad, et al. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience 2006.

401. Ludäscher Bertram, Himmeröder Rainer, Lausen Georg, May Wolfgang, Schlepphorst Christian. Managing semistructured data with FLORID: A deductive object-oriented perspective. Information Systems. 1998;23.

402. Luo Qiong, Krishnamurthy Sailesh, Mohan C, et al. Middle-tier database caching for e-business. In: SIGMOD. 2002.

403. Luo Yi, Wang Wei, Lin Xuemin. Spark: A keyword search engine on relational databases. In: ICDE. 2008.

404. Mackert Lothar F, Lohman Guy M. R* optimizer validation and performance evaluation for distributed queries. In: VLDB. 1986.

405. Mackert Lothar F, Lohman Guy M. R* optimizer validation and performance evaluation for local queries. In: SIGMOD. 1986.

406. Madhavan Jayant, Bernstein Philip A, Doan AnHai, Halevy Alon Y. Corpus-based schema matching. In: Proc of ICDE. 2005:57–68.

407. Madhavan Jayant, Bernstein Philip A, Rahm Erhard. Generic schema matching with Cupid. In: VLDB. 2001.

408. Madhavan Jayant, Halevy Alon. Composing mappings among data sources. In: Proc of VLDB. 2003.

409. Madhavan Jayant, Jeffery Shawn, Cohen Shirley, et al. Web-scale data integration: You can only afford to pay as you go. In: CIDR. 2007.

410. Madhavan Jayant, Ko David, Kot Lucja, Ganapathy Vignesh, Rasmussen Alex, Halevy Alon. Google’s deep-web crawl. In: Proc of VLDB. 2008:1241–1252.

411. Magnani M, Montesi D. Uncertainty in data integration: Current approaches and open problems. In: VLDB Workshop on Management of Uncertain Data. 2007:18–32.

412. Magnani M, Rizopoulos N, Brien P, Montesi D. Schema integration based on uncertain semantic mappings. Lecture Notes in Computer Science 2005:31–46.

413. Mahmoud Hatem A, Aboulnaga Ashraf. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: SIGMOD Conference. 2010:411–422.

414. Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval Cambridge University Press 2008.

415. Marian Amélie, Bruno Nicolas, Gravano Luis. Evaluating top-k queries over web-accessible databases. ACM Trans Database Syst. 2004;29.

416. McCallum A, Wellner B. Conditional models of identity uncertainty with application to noun coreference. In: Proc of the Conf on Advances in Neural Information Processing Systems (NIPS). 2004.

417. McCallum Andrew, Nigam Kamal, Rennie Jason, Seymore Kristie. A machine learning approach to building domain-specific search engines. In: IJCAI. 1999:662–667.

418. McCallum Andrew K, Nigam Kamal, Ungar Lyle H. Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD. 2000.

419. McCann R, AlShebli BK, Le Q, Nguyen H, Vu L, Doan A. Mapping maintenance for data integration systems. In: VLDB. 2005.

420. McCann Robert, Doan AnHai, Varadarajan Vanitha, Kramnik Alexander, Zhai ChengXiang. Building data integration systems: A mass collaboration approach. In: WebDB. 2003:25–30.

421. McCann Robert, Shen Warren, Doan AnHai. Matching schemas in online communities: A web 2.0 approach. In: ICDE. 2008:110–119.

422. McDowell Luke, Etzioni Oren, Halevy Alon, et al. Enticing ordinary people onto the semantic web via instant gratification. In: Proceedings of the Second International Conference on the Semantic Web. October 2003.

423. Meek C, Patel JM, Kasetty S. OASIS: An online and accurate technique for local-alignment searches on biological sequences. In: VLDB. 2003:910–921.

424. Meliou Alexandra, Gatterbauer Wolfgang, Moore Katherine F, Suciu Dan. The complexity of causality and responsibility for query answers and non-answers. PVLDB. 2010;4.

425. Meliou Alexandra, Gatterbauer Wolfgang, Nath Suman, Suciu Dan. Tracing data errors with view-conditioned causality. In: SIGMOD. 2011.

426. Melnik Sergey, Bernstein Philip A, Halevy Alon Y, Rahm Erhard. Supporting executable mappings in model management. In: Proc of SIGMOD. 2005:167–178.

427. Melnik Sergey, Garcia-Molina Hector, Rahm Erhard. Similarity flooding: A versatile graph matching algorithm. In: Proceedings of the 18th International Conference on Data Engineering (ICDE). 2002.

428. Melnik Sergey, Rahm Erhard, Bernstein Phil. Rondo: A programming platform for generic model management. In: Proc of SIGMOD. 2003.

429. Meng X, Hu D, Li C. Schema-guided wrapper maintenance for Web-data extraction. In: WIDM. 2003.

430. Miklau Gerome, Suciu Dan. Containment and equivalence for a fragment of XPath. J ACM. 2004;51.

431. Miller George A. Wordnet: A lexical database for English. In: HLT. 1994.

432. Miller Renée J, Haas Laura M, Hernandez Mauricio. Schema matching as query discovery. In: VLDB. 2000.

433. Milo Tova, Abiteboul Serge, Amann Bernd, Benjelloun Omar, Ngoc Frederic Dang. Exchanging intensional XML data. In: Proc of SIGMOD. 2003:289–300.

434. Milo Tova, Zohar Sagit. Using schema matching to simplify heterogeneous data translation. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1998.

435. Mintz Mike, Bills Steven, Snow Rion, Jurafsky Daniel. Distant supervision for relation extraction without labeled data. In: Su Keh-Yih, Su Jian, Wiebe Janyce, eds. ACL/AFNLP. The Association for Computer Linguistics 2009:1003–1011.

436. Missier Paolo, Sahoo Satya Sanket, Zhao Jun, Goble Carole A, Sheth Amit P. Janus: From workflows to semantic provenance and linked open data. In: IPAW. 2010.

437. Mistry Hoshi, Roy Prasan, Sudarshan S, Ramamritham Krithi. Materialized view selection and maintenance using multi-query optimization. In: SIGMOD. 2001.

438. Mitchell Tom M. Machine Learning McGraw Hill 1997.

439. Mitra Prasenjit. An algorithm for answering queries efficiently using views. In: ADC. 2001:99–106.

440. Mitra Prasenjit, Noy Natasha F, Jaiswal Anuj R. Omen: A probabilistic ontology mapping tool. In: International Semantic Web Conference. 2005:537–547.

441. Mohapatra R, Rajaraman K, Sung SY. Efficient wrapper reinduction from dynamic Web sources. In: Web Intelligence. 2004.

442. Monge AE, Elkan C. The field matching problem: Algorithms and applications. In: KDD. 1996.

443. Monge AE, Elkan CP. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc of the Second ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD-97). 1997:23–29.

444. Moon Hyun Jin, Curino Carlo, Deutsch Alin, Hou Chien-Yi, Zaniolo Carlo. Managing and querying transaction-time databases under schema evolution. PVLDB. 2008;1.

445. Mork Peter, Bernstein Philip A, Melnik Sergey. Teaching a schema translator to produce o/r views. In: Proceedings of Entity Relationship Conference. 2007:102–119.

446. Motro Amihai. Integrity = validity + completeness. ACM Transactions on Database Systems. December 1989;14(4):480–502.

447. Mumick Inderpal Singh, Quass Dallan, Mumick Barinderpal Singh. Maintenance of data cubes and summary tables in a warehouse. In: SIGMOD. 1997.

448. Muniswamy-Reddy Kiran-Kumar, Holland David A, Braun Uri, Seltzer Margo I. Provenance-aware storage systems. In: USENIX Annual Technical Conference, General Track. 2006.

449. Muslea I, Minton S, Knoblock CA. A hierarchical approach to wrapper induction. In: Agents. 1999.

450. Muslea I, Minton S, Knoblock CA. Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems. 2001;4(1/2):93–114.

451. Muslea I, Minton S, Knoblock CA. Active learning with strong and weak views: A case study on wrapper induction. In: IJCAI. 2003.

452. Nash A, Bernstein P, Melnik S. Composition of mappings given by embedded dependencies. ACM Transactions on Database Systems. 2007;32.

453. F. Naumann and M. Herschel. An Introduction to Duplicate Detection (Synthesis Lectures on Data Management). Morgan & Claypool, 2010. M. Tamer Ozsu (editor).

454. Naumann Felix, Freytag Johann Christoph, Leser Ulf. Completeness of integrated information sources. Inf Syst. 2004;29(7):583–615.

455. Navarro G. A guided tour to approximate string matching. ACM Comput Surv. 2001;33(1):31–88.

456. Needleman S, Wunsch C. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970;48(3):443–453.

457. Neven Frank, Schwentick Thomas. XPath containment in the presence of disjunction, DTDs, and variables. In: ICDT. 2003.

458. Newcombe HB, Kennedy JM, Axford S, James A. Automatic linkage of vital records. Science. 1959;130(3381):954–959.

459. Ng WS, Ooi BC, Tan K-L, Zhou A. Peerdb: A p2p-based system for distributed data sharing. In: ICDE. 2003.

460. Nguyen Hoa, Fuxman Ariel, Paparizos Stelios, Freire Juliana, Agrawal Rakesh. Synthesizing products for online catalogs. PVLDB. 2011;4(7):409–418.

461. Nie Zaiqing, Wen Ji-Rong, Ma Wei-Ying. Object-level vertical search. In: CIDR. 2007:235–246.

462. Nottelmann H, Straccia U. Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management. 2007;43(3):552–576.

463. Noy NF, Doan A, Halevy AY. Semantic integration. AI Magazine. 2005;26(1):7–10.

464. Noy Natalya F, Musen Mark A. PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the National Conference on Artificial Intelligence (AAAI). 2000.

465. Noy Natalya Freidman, Musen Mark A. Smart: Automated support for ontology merging and alignment. In: Proceedings of the Knowledge Acquisition Workshop. 1999.

466. Noy Natalya Fridman. Semantic integration: A survey of ontology-based approaches. SIGMOD Record. 2004;33(4):65–70.

467. Ntoulas Alexandros, Zerfos Petros, Cho Junghoo. Downloading textual hidden web content through keyword queries. In: JCDL. 2005:100–109.

468. Oinn T, Greenwood M, Addis M, et al. Taverna: Lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience. 2006;18.

469. On B, Koudas N, Lee D, Srivastava D. Group linkage. In: ICDE. 2007.

470. Open provenance model., 2008.

471. Ozsu M Tamer, Valduriez Patrick. Principles of Distributed Database Systems Springer 2011.

472. Palopoli Luigi, Sacc Domenico, Terracina G, Ursino Domenico. A unified graph-based framework for deriving nominal interscheme properties, type conflicts and object cluster similarities. In: Proceedings of CoopIS. 1999.

473. Papotti Paolo, Crescenzi Valter, Merialdo Paolo, Bronzi Mirko, Blanco Lorenzo. Redundancy-driven web data extraction and integration. In: WebDB. 2010.

474. Parameswaran A, Dalvi N, Garcia-Molina H, Rastogi R. Optimal schemes for robust web extraction. In: VLDB. 2011.

475. Pasula H, Marthi B, Milch B, Russell S, Shpitser I. Identity uncertainty and citation matching. In: Proc of the NIPS Conf. 2002:1401–1408.

476. Peng Feng, Chawathe Sudarshan S. Xsq: A streaming xpath engine. ACM Trans Database Syst. 2005;30.

477. Philips L. Hanging on the metaphone. Computer Language Magazine. 1990;7(12):39–44.

478. Philips L. The double metaphone search algorithm. C/C++ Users Journal. 2000;18.

479. Pinheiro JC, Sun DX. Methods for linking and mining massive heterogeneous databases. In: Proc of the ACM Int Conf on Knowledge Discovery and Data Mining (KDD). 1998:309–313.

480. Popa Lucian, Tannen Val. An equational chase for path conjunctive queries, constraints and views. In: Proceedings of the International Conference on Database Theory (ICDT). 1999.

481. Pottinger Rachel, Bernstein Philip A. Merging models based on given correspondences. In: Proc of VLDB. 2003:826–873.

482. Pottinger Rachel, Halevy Alon. Minicon: A scalable algorithm for answering queries using views. In: VLDB Journal. 2001.

483. Pound Jeffrey, Ilyas Ihab F, Weddell Grant E. Expressive and flexible access to web-extracted data: A keyword-based structured query language. In: SIGMOD Conference. 2010:423–434.

484. Pu C. Key equivalence in heterogeneous databases. In: Proc of the 1st Int Workshop on Inter-operability in Multidatabase Systems. 1991.

485. Puhlmann Sven, Weis Melanie, Naumann Felix. Xml duplicate detection using sorted neighborhoods. In: EDBT. 2006:773–791.

486. Dallan Quass and Jennifer Widom. On-line warehouse view maintenance. In SIGMOD.

487. Rahm Erhard, Bernstein Philip A. A survey of approaches to automatic schema matching. VLDB Journal. 2001;10(4):334–350.

488. Rajaraman Anand, Sagiv Yehoshua, Ullman Jeffrey D. Answering queries using templates with binding patterns. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1995:105–112.

489. Ramakrishnan Raghu, Gehrke Johannes. Database Management Systems McGraw Hill 2000.

490. Raman Vijayshankar, Deshpande Amol, Hellerstein Joseph M. Using state modules for adaptive query processing. In: ICDE. 2003.

491. Raman Vijayshankar, Hellerstein Joseph M. Potter’s wheel: An interactive data cleaning system. In: VLDB. 2001:381–390.

492. Ramesh Aditya, Sudarshan S, Joshi Purva. Keyword search on form results. PVLDB. 2011;4.

493. Raposo J, Pan A, Álvarez M, Hidalgo J. Automatically maintaining wrappers for semi-structured Web sources. Data Knowl Eng. 2007;61(2):331–358.

494. Ravikumar PD, Cohen W. A hierarchical graphical model for record linkage. In: Proc of the Conf on Uncertainty in Artificial Intelligence (UAI). 2004:454–461.

495. Razniewski Simon, Nutt Werner. Completeness of queries over incomplete databases. PVLDB. 2011;4(11):749–760.

496. Re Christopher, Dalvi Nilesh N, Suciu Dan. Efficient top-k query evaluation on probabilistic data. In: ICDE. 2007.

497. Ristad ES, Yianilos PN. Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell. 1998;20(5):522–532.

498. Roth Mary Tork, Ozcan Fatma, Haas Laura M. Cost models do matter: Providing cost information for diverse data sources in a federated system. In: VLDB. 1999.

499. Rundensteiner Elke A, Ding Luping, Sutherland Timothy M, Zhu Yali, Pielech Bradford, Mehta Nishant. Cape: Continuous query engine with heterogeneous-grained adaptivity. In: VLDB. 2004.

500. R. C. Russell. 1918. U.S. Patent 1,261,167.

501. R. C. Russell. 1922. U.S. Patent 1,435,663.

502. Sagiv Y, Yannakakis M. Equivalence among relational expressions with the union and difference operators. Journal of the ACM. 1981;27(4):633–655.

503. Sahuguet A, Azavant F. Web ecology: Recycling HTML pages as XML documents using W4F. In: WebDB (Informal Proceedings). 1999.

504. Vaz Salles Marcos Antonio, Dittrich Jens-Peter, Karakashian Shant Kirakos, Girard Olivier René, Blunschi Lukas. iTrails: Pay-as-you-go information integration in dataspaces. In: VLDB. 2007.

505. Yatin Saraiya. Subtree-elimination algorithms in deductive databases. PhD thesis, Stanford University, Stanford, California, 1991.

506. Sarawagi S. Information extraction. Foundations and Trends in Databases. 2008;1(3):261–377.

507. Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In: Proc of the ACM Int Conf on Knowledge Discovery and Data Mining (KDD). 2002:269–278.

508. Sarawagi Sunita, Kirpal Alok. Efficient set joins on similarity predicates. In: SIGMOD Conference. 2004:743–754.

509. Sarkas Nikos, Paparizos Stelios, Tsaparas Panayiotis. Structured annotations of web queries. In: SIGMOD Conference. 2010.

510. Sarma Anish Das, Dong Xin, Halevy Alon Y. Bootstrapping pay-as-you-go data integration systems. In: SIGMOD Conference. 2008:861–874.

511. Sayyadian M, Lee Y, Doan A, Rosenthal A. Tuning schema matching software using synthetic scenarios. In: VLDB. 2005:994–1005.

512. Sayyadian Mayssam, LeKhac Hieu, Doan AnHai, Gravano Luis. Efficient keyword search across heterogeneous relational databases. In: ICDE. 2007.

513. Scheidegger Carlos Eduardo, Vo Huy T, Koop David, Freire Juliana, Silva Cláudio T. Querying and re-using workflows with vstrails. In: SIGMOD Conference. 2008.

514. Schnaitter Karl, Polyzotis Neoklis. Evaluating rank joins with optimal cost. In: PODS. 2008.

515. Schnaitter Karl, Spiegel Joshua, Polyzotis Neoklis. Depth estimation for ranking query optimization. In: VLDB. 2007.

516. Schwentick Thomas. XPath query containment. SIGMOD Record. 2004;33.

517. Segoufin Luc, Vianu Victor. Validating streaming XML documents. In: PODS. 2002.

518. Sen Prithviraj, Deshpande Amol. Representing and querying correlated tuples in probabilistic databases. In: ICDE. 2007.

519. Shen Warren, DeRose Pedro, McCann Robert, Doan AnHai, Ramakrishnan Raghu. Toward best-effort information extraction. In: SIGMOD. 2008.

520. Shen Warren, DeRose Pedro, Vu Long, Doan AnHai, Ramakrishnan Raghu. Source-aware entity matching: A compositional approach. In: ICDE. 2007:196–205.

521. Shen Warren, Li Xin, Doan AnHai. Constraint-based entity matching. In: AAAI. 2005:862–867.

522. Silva Cláudio T, Anderson Erik W, Santos Emanuele, Freire Juliana. Using vistrails and provenance for teaching scientific visualization. Comput Graph Forum. 2011;30.

523. Simitsis Alkis, Koutrika Georgia, Ioannidis Yannis. Précis: From unstructured keywords as queries to structured databases as answers. The VLDB Journal. 2008;17 Available from; 2008.

524. Singla P, Domingos P. Object identification with attribute-mediated dependences. In: Proc of the PKDD Conf. 2005:297–308.

525. Smith John Miles, Bernstein Philip A, Dayal Umeshwar, et al. MULTIBASE – Integrating heterogeneous distributed database systems. In: Proceedings of 1981 National Computer Conference. 1981.

526. Smith T, Waterman M. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147(1):195–197.

527. Socrata: The social data cloud company., 2011.

528. Soderland S. Learning information extraction rules for semi-structured and free text. Machine Learning. 1999;34(1-3):233–272.

529. Soderland S, Fisher D, Aseltine J, Lehnert WG. Crystal: Inducing a conceptual dictionary. In: IJCAI. 1995.

530. Michael Stonebraker. The Design and Implementation of Distributed INGRES. Boston, MA, USA, 1986.

531. Stonebraker Michael, Abadi Daniel J, DeWitt David J, et al. MapReduce and parallel DBMSs: Friends or foes?. Commun ACM. 2010;53.

532. Stonebraker Michael, Aoki Paul M, Litwin Witold, et al. Mariposa: A wide-area distributed database system. VLDB J. 1996;5.

533. V.S. Subrahmanian, S. Adali, A. Brink, R. Emery, J. Lu, A. Rajput, T. Rogers, R. Ross, and C. Ward. HERMES: A heterogeneous reasoning and mediator system. Technical Report, University of Maryland, 1995.

534. Suchanek Fabian M, Kasneci Gjergji, Weikum Gerhard. Yago: A large ontology from wikipedia and wordnet. J Web Sem. 2008;6.

535. Suciu Dan, Olteanu Dan, Ré Christopher, Koch Christoph. Probabilistic Databases Synthesis Lectures on Data Management 2011.

536. Tableau software., 2011.

537. R. L. Taft. Name search techniques. 1970. Technical Report, special report no. 1, New York State Identification and Intelligence System, Albany, N.Y.

538. Talukdar Partha Pratim, Ives Zachary G, Pereira Fernando. Automatically incorporating new sources in keyword search-based data integration. In: SIGMOD. 2010.

539. Talukdar Partha Pratim, Jacob Marie, Mehmood Muhammad Salman, et al. Learning to create data-integrating queries. In: VLDB. 2008.

540. Tata Sandeep, Lohman Guy M. Sqak: Doing more with keywords. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008; Available from; 2008.

541. Tatarinov Igor, Halevy Alon. Efficient query reformulation in peer data management systems. In: Proc of SIGMOD. 2004.

542. Tatarinov Igor, Viglas Stratis, Beyer Kevin S, Shanmugasundaram Jayavel, Shekita Eugene J, Zhang Chun. Storing and querying ordered XML using a relational database system. In: SIGMOD. 2002.

543. Tatbul Nesime, Cetintemel Ugur, Zdonik Stanley B, Cherniack Mitch, Stonebraker Michael. Load shedding in a data stream manager. In: VLDB. 2003.

544. Taylor Nicholas E, Ives Zachary G. Reconciling while tolerating disagreement in collaborative data sharing. In: SIGMOD. 2006.

545. Tejada S, Knoblock CA, Minton S. Learning object identification rules for information integration. Inf Syst. 2001;26(8):607–633.

546. Thor Andreas, Rahm Erhard. MOMA – A mapping-based object matching system In. In: CIDR. 2007:247–258.

547. Tian Feng, DeWitt David J. Tuple routing strategies for distributed eddies. In: VLDB. 2003.

548. Tian Y, Tata S, Hankins RA, Patel JM. Practical methods for constructing suffix trees. VLDB J. 2005;14(3):281–299.

549. Ting Kai-Ming, Witten Ian H. Issues in stacked generalization. Journal of Artificial Intelligence Research. 1999;10:271–289.

550. Toda Guilherme A, Cortez Eli, da Silva Altigran S, de Moura Edleno. A probabilistic approach for automatically filling form-based web interfaces. PVLDB. 2011;4(3):151–160.

551. Tsatalos Odysseas G, Solomon Marvin H, Ioannidis Yannis E. The GMAP: A versatile tool for physical data independence. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1994:367–378.

552. Tu Yi-Cheng, Liu Song, Prabhakar Sunil, Yao Bin, Schroeder William. Using control theory for load shedding in data stream management. In: ICDE. 2007.

553. Tuchindra Rattapoom, Szekely Pedro, Knoblock Craig. Building mashups by example. In: Proceedings of CHI. 2008:139–148.

554. Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems, Volumes I, II. Computer Science Press, Rockville MD, 1989.

555. Ullman Jeffrey D. Information Integration using Logical Views. In: Proceedings of the International Conference on Database Theory (ICDT). 1997.

556. Urhan Tolga, Franklin Michael J, Amsaleg Laurent. Cost based query scrambling for initial delays. In: SIGMOD. 1998.

557. Meyden Ron van der. The complexity of querying indefinite data about linearly ordered domains. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS). 1992:331–345.

558. Venetis Petros, Halevy Alon Y, Madhavan Jayant, et al. Recovering semantics of tables on the web. PVLDB. 2011;4(9):528–538.

559. Vernica Rares, Carey Michael J, Li Chen. Efficient parallel set-similarity joins using MapReduce. In: SIGMOD Conference. 2010:495–506.

560. Wang Daisy Zhe, Franklin Michael J, Garofalakis Minos N, Hellerstein Joseph M, Wick Michael L. Hybrid in-database inference for declarative information extraction. In: SIGMOD Conference. 2011.

561. Wang Daisy Zhe, Michelakis Eirinaios, Garofalakis Minos N, Hellerstein Joseph M. BayesStore: Managing large, uncertain data repositories with probabilistic graphical models. PVLDB. 2008;1.

562. Wang J, Lochovsky FH. Data extraction and label assignment for Web databases. In: WWW. 2003.

563. Wei Wang. Similarity join algorithms: An introduction. Tutorial Presentation. Available from, 2008.

564. Wang YR, Madnick SE. The inter-database instance identification problem in integrating autonomous systems. In: Proc of the 5th Int Conf on Data Engineering (ICDE-89). 1989:46–55.

565. Wang Yalin, Hu Jianying. A machine learning based approach for table detection on the web. In: WWW. 2002:242–250.

566. Waterman M, Smith T, Beyer W. Some biological sequence metrics. Advances in Math. 1976;20(4):367–387.

567. Weis Melanie, Naumann Felix. Detecting duplicates in complex XML data. In: ICDE. 2006:109.

568. Whang Steven Euijong, Garcia-Molina Hector. Developments in generic entity resolution. IEEE Data Eng Bull. 2011;34(3):51–59.

569. Wick Michael L, McCallum Andrew, Miklau Gerome. Scalable probabilistic databases with factor graphs and MCMC. PVLDB. 2010;3.

570. Wick Michael L, Rohanimanesh Khashayar, Schultz Karl, McCallum Andrew. A unified approach for schema matching, coreference and canonicalization. In: KDD. 2008:722–730.

571. Wiederhold Gio. Mediators in the architecture of future information systems. In: IEEE Computer. March 1992:38–49.

572. W. E. Winkler. Improved decision rules in the Fellegi-Sunter model of record linkage, 1993. Technical Report, Statistical Research Report Series RR93/12, U.S. Bureau of the Census.

573. W. E. Winkler. The state of record linkage and current research problems, 1999. Technical Report, Statistical Research Report Series RR99/04, U.S. Bureau of Census.

574. W. E. Winkler. Methods for record linkage and Bayesian networks, 2002. Technical Report, Statistical Research Report Series RRS2002/05, U.S. Bureau of the Census.

575. W. E. Winkler and Y. Thibaudeau. An application of the Fellegi-Sunter model of record linkage to the 1990 U.S.} decennial census. 1991. Technical Report, Statistical Research Report Series RR91/09, U.S. Bureau of the Census, Washington, D.C.

576. Wolpert David. Stacked generalization. Neural Networks. 1992;5:241–259.

577. Wong Jeffrey, Hong Jason I. Making mashups with marmite: Towards end-user programming for the web. In: CHI. 2007:1435–1444.

578. Wu Fei, Weld Daniel S. Autonomously semantifying wikipedia. In: Silva Mário J, Laender Alberto HF, Baeza-Yates Ricardo A, McGuinness Deborah L, Olstad Bjørn, Olsen Øystein Haug, Falcão André O, eds. CIKM. ACM 2007:41–50.

579. Wu Fei, Weld Daniel S. Automatically refining the wikipedia infobox ontology. In: Huai Jinpeng, Chen Robin, Hon Hsiao-Wuen, Liu Yunhao, Ma Wei-Ying, Tomkins Andrew, Zhang Xiaodong, eds. WWW. ACM 2008:635–644.

580. Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Hajic et al. [282], pages 118–127.

581. Wu Wensheng, Yu Clement T, Doan AnHai, Meng Weiyi. An interactive clustering-based approach to integrating source query interfaces on the deep web. In: SIGMOD Conference. 2004:95–106.

582. Xiao Chuan, Wang Wei, Lin Xuemin, Yu Jeffrey Xu. Efficient similarity joins for near duplicate detection. In: WWW. 2008:131–140.

583. Xin Dong, He Yeye, Ganti Venkatesh. Keyword++: A framework to improve keyword search over entity databases. PVLDB. 2010;3.

584. Yagoub Khaled, Florescu Daniela, Issarny Valerie, Valduriez Patrick. Caching strategies for data-intensive web sites. In: Proceedings of the International Conference on Very Large Databases (VLDB). 2000:188–199.

585. Yang Beverly, Garcia-Molina Hector. Improving search in peer-to-peer networks. In: ICDCS. 2002:5–14.

586. Yang HZ, Larson PA. Query transformation for PSJ-queries. In: Proceedings of the International Conference on Very Large Databases (VLDB). 1987:245–254.

587. Yu Jeffrey Xu, Lu Qin, Chang Lijun. Keyword Search in Databases Synthesis Lectures on Data Management 2010.

588. Zaharioudakis Markos, Cochrane Roberta, Lapis George, Pirahesh Hamid, Urata Monica. Answering complex SQL queries using automatic summary tables. In: Proceedings of the ACM SIGMOD Conference. 2000:105–116.

589. Zhai Y, Liu B. Web data extraction based on partial tree alignment. In: WWW. 2005.

590. Zhang Y, Tang N, Boncz PA. Efficient distribution of full-fledged XQuery. In: Engineering. April 2009:565–576.

591. Zhao Jun, Sahoo Satya Sanket, Missier Paolo, Sheth Amit P, Goble Carole A. Extending semantic provenance into the web of data. IEEE Internet Computing. 2011;15.

592. Zhou Gang, Hull Richard, King Roger, Franchitti Jean-Claude. Data integration and warehousing using h2o. IEEE Data Eng Bull. 1995;18.

593. Zhou Wenchao, Fei Qiong, Narayan Arjun, Haeberlen Andreas, Loo Boon Thau, Sherr Micah. Secure network provenance. In: SOSP. 2011.

594. Zhou Wenchao, Fei Qiong, Sun Shengzhi, et al. NetTrails: A declarative platform for maintaining and querying provenance in distributed systems. In: SIGMOD Conference. 2011.

595. Zhou Wenchao, Sherr Micah, Tao Tao, Li Xiaozhou, Loo Boon Thau, Mao Yun. Efficient querying and maintenance of network provenance at internet-scale. In: SIGMOD. 2010.

596. Zhuge Yue, Garcia-Molina Hector, Wiener Janet L. Multiple view consistency for data warehousing. In: ICDE. 1997.

