"Semantic integration" is a term used in several contexts across different areas of computer design, programming, management and administration. In general, it refers to aggregating information from one or more disparate sources for the purpose of creating some system in which the information is organized in a way that makes sense to a user. Semantic integration frequently deals with defining and establishing metadata connections, or relationships, between different parts of the different data sources so they can be logically structured. This could involve creating relational connections between two separate databases, building a graph of how portions of different websites relate to one another, or integrating factual data from an unknown, arbitrary format into a concise record structure. Many practical applications for a fully implemented semantic integration system exist, including research libraries or networks, more organic search engine algorithms that can extrapolate context from a search and, ultimately — through the use of metadata publishing — seamless integration of different computer systems for data exchange.
The ultimate goal of semantic integration in most cases is to be able to associate information in a dynamic way. In a very simple example, this could mean being able to associate fields in one database with fields in another database, despite the fact that they are not exact matches, such as relating a field named "size" to a field named "height". This association could be performed through user-defined rules that specifically link the two, or it could be done with algorithms that compare the numerical data of the fields and determine a probable match. The words "size" and "height" then become metadata terms that other external semantic integration systems might be able to use to find the information for a user without having to know specifically how any single system stores the data.
In complex semantic integration systems, such as those designed for research, metadata publication and sharing is a key component for operation. Metadata can be culled from documents to form large relational data structures that can assist in queries. This means research papers on any topic can be integrated into a system that measures and records the frequency of words, and those words can assist in user searches for information, allowing related topics to be listed from any source without the need for specific conversions.
One of the challenges that face designers of semantic integration systems is how to aggregate the data. Using humans to classify and make relations between data from various sources can be time-consuming and, ultimately, very reliant on the individual experiences of the person. When algorithms are used to automatically make associations, certain relationships might be overlooked because of some minor difference that the algorithm is unable to resolve. A method of implementing semantic integration on a large scale uses learning-based algorithms in conjunction with human-based rules management and, in some cases, actual human decision-making during the process.