Faceted Classification

Keywords are typically very poor at systemizing data. Due to multiple meaning of words in various contexts, differences in their interpretation, existence of synonyms, using keywords to describe object properties is often unreliable and inaccurate. Internet users know how long it may take to select a suitable combination of keywords for a search after excluding all thinkable variants of collateral, metaphorical, jargon and special meaning. In addition, keywords cannot be practically checked for consistency since they are not structured as a whole, i.e. they are not themselves systemized.

Therefore, free (i.e. arbitrarily selectable) keywords are useful for reference interactive search only. Their use should always be complemented by other, more effective data systematization techniques, or at least accompanied by visual analysis of the search results. The most frequently used tags being not the highest nor lowest in a hierarchy.

To date there has been little quantitative analysis of folksonomy tags. The
analyses that have been conducted have at least confirmed that for each information
resource (URL), a very few tags are used with high frequency, and there are a large
number of tags with a very low shared usage — a “long tail” on the curve. This
phenomenon is consistent with the use of uncontrolled vocabularies for classification.

Modern information technologies (IT), both “paper” and electronic, intensively use various classification methods to store, search for, retrieve and compare data. In books, lists and catalogues contents are arranged by chapters, sections and paragraphs.

Hearst defines facets as a [sic] “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.” LaBarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group. (2010, p. 58)

In faceted classification, there are facets, subfacets (also called arrays) and facet values. Using subfacets makes the faceted classification into a hierarchical faceted classification.

Broughton defines faceted classification as “…adequate object description (labeling [sic] the items to support subject retrieval), providing search tools that support browsing, navigation and retrieval, and, to a more limited extent, the presentation of results” (2006, p. 50). Broughton states that faceted classification helps to: synthesize the complexity of a subject; provide a consistent, logical and regular syntax and structure which can be used by computers; be used in a user interface on a computer or on the Internet; be easily converted into a thesaurus or subject headings; and provide a tool for browsing (2006).

Tree-structured information architectures

They impose a strictly predefined sequence of criteria and do not allow to by-pass any of them. They use AND, but not OR or NOT. There is not ease of modification. Tree-structured information architectures are only convenient to build simplest classifications.

Hierarchical faceted metadata

Hierarchical faceted metadata has shown to be a promising middle ground, able to satisfy the needs of a wide range of users with different mental models and vocabularies [See] Yee, K.P., Swearingen, K., Li, K., & Hearst, M., (2003). Faceted metadata for image searching and browsing, In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, (pp. 401-408). New York: ACM Press. Available April 21, 2007, at http://flamenco.berkeley.edu/papers/flamenco-chi03.pdf or at http://portal.acm.org.]. Facets are orthogonal categories of terms (here tags) within a metadata system. Each facet has a name, and it addresses a different conceptual dimension or feature type relevant to the collection such as activities, components, geographical locations, forms or languages. Facets can be flat or hierarchical. A faceted search interface requires that each object in the collection be classified using one or more tags/terms (or foci, as they are technically called in faceting) from one or more different facets.

In a hierarchical faceted navigation tool, choosing a term that has sub-terms from one of the facets is equivalent to performing a disjunction (Boolean OR) over all the terms subordinate to the selected one. For example, choosing “navigation design” from the Themes facet would provide a search over all the navigation types listed in the facet, such as “breadcrumbs,” unless the user chose to narrow the search. When the user chooses terms (tags) from different facets such as Themes and Forms, however, systems typically automatically conjoin them (Boolean AND), for example “breadcrumbs” AND “case study.” The complete search thus includes a disjunction of all the terms selected from the same facet conjoined with all the tags selected from other facets. In this kind of interface, users can navigate multiple faceted hierarchies at the same time. Usability studies show how this approach is preferred over single hierarchies because users feel in control without getting lost [See Yee, K.P., Swearingen, K., Li, K., & Hearst, M., (2003). Faceted metadata for image searching and browsing, In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, (pp. 401-408). New York: ACM Press. Available April 21, 2007, at http://flamenco.berkeley.edu/papers/flamenco-chi03.pdf or at http://portal.acm.org and English, J, Hearst, M., Sinha, R., Swearingen, K., and Yee, P. (2002b). Flexible search and browsing using faceted metadata. Unpublished manuscript. Available April 21, 2007, at http://flamenco.berkeley.edu/papers/flamenco02.pdf.].

For these reasons, faceted metadata can be used to support navigation along several dimensions simultaneously, allowing seamless integration between browsing and free text searching and an easy alternation between refining (zooming in) and broadening (zooming out) [see Flexible search and browsing using faceted metadata manuscript above]. The major benefits resulting from this approach include a strong reduction of the mental work, which favors recognition over recall; and better support for exploration, discovery and iterative query refinement [see Hearst, M.A. (2006, April). Clustering versus faceted categories for information exploration. Communication of the ACM, 49(4), 59-61, at http://flamenco.Berkeley.edu/papers/cacm06.pdf and at http://portal.acm.org].

Again, usability studies attest that hierarchical faceted interfaces are preferred over simpler keyword based search interfaces, and they document that such interfaces can be easily understood by the average user [see Yee and Li (2003), Faceted metadata for image searching and browsing, above] if iteratively designed and tested to address usability issues [see English, J., Hearst, M., Sinha, R., Swearingen K., and Yee, P., (2002a). Hierarchical faceted metadata in site search interfaces, In Conference on Human Factors in Computing Systems, CHI ’02: Extended abstracts on human factors in computing systems (pp. 628-539). Available at http://flamenco.berkeley.edu/papers/chi02_short_paper.pdf or at http://portal.acm.org].

Faceted Classification serves up multiple ‘pure’ classification schemes

Rosenfeld was quoted, “Faceted Classification serves up multiple ‘pure’ classification schemes rather than a … taxonomy.”

In faceted approaches, the entire classification is broken into several distinct aspects (facets), each described by a separate tree. In interactive multi-criteria search and other operations, object parameters can be specialized with simultaneously using all the facets in arbitrary order.

Faceted knowledge representation schemes do not impose a strict sequence of specializations and do not require the database developer to list them all explicitly. That makes facets much more flexible and functional than trees. Nevertheless, facets are also fraught with serious shortcomings.

Everything in the Lawi Project can have a unique identifier, since URIs gives us a way to create a globally unique ID for anything we need to point to.

Disadvantages of facet classification

“The strength of a faceted classification lies in the fundamental categories, which should express the important attributes of the entries being classified” (Kwasnik, B., 1999, The role of classification in knowledge representation and discovery. Library Trends 48 , 1: page 41). In order for something to be found in a catalog or on the shelf is with a proper classification. In order for the material to be found, the cataloger needs to understand the context, the content and the end user for that information. If any of these queues are not followed the information will be overlooked and will never be utilized properly.

Kwasnik (1999) then describes, “Most faceted classification do not do a good job of connecting the various facets in a meaningful way.”

Lastly, Kwasnik (1999) notes the difficulty of visualization using faceted classification, “A hierarchy or a tree, and especially a paradigm, can be visually displayed in such a way that the entities and their relationships are made evident.” Facets are not necessarily connected to each other. You can have the same facet of time between two subjects but having nothing in common.

In standard faceted classifications, combinations of parameters defining categories are formed using only logical AND that limits possibilities of data organization and comparison.

Classification of large, diverse and complex data sets

Many fields of application require the support of complex comparison of data in addition to search.

Also, a knowledge representation system should deal with not only individual objects but also categories of objects defined by combinations of properties, just like in interactive search. The system should be able to process complex combinations of properties which describe categories by 1) intersections of features, such as “ball bearing supplier of enterprise No.1 AND enterprise No.2”, 2) their unions, such as “interface subsystems accepting input from “keyboard OR mouse”, and 3) negations, such as “tranquilizers NOT counter-indicated for acute kidney conditions”. Naturally, all these operations should be performed directly with category descriptions without searching through lists of individual objects.

Furthermore, categories defined by consistent combinations of properties should be ordered by the «general-specific» relationship, just like chapters, sections and paragraphs in a book. However, unlike chapters, sections, etc., the categories should be allowed to have more than one direct general (also called parent) category, which makes it possible to classify objects by a number of different aspects simultaneously. Such an ordering known as polyhierarchical classification is necessary.

Ordering categories in the form of polyhierarchical classification and support of complex category descriptions using AND, OR and NOT logical operations are mandatory for highly automated knowledge representation systems. These more demanding requirements considerably complicate already complex problems of analytical and programming support, therefore additional degree of flexibility is required from the information architecture.

Lawi Faceted classification scheme

“Integrating bottom-up and top-down classification in a social taxonomy system”

The Lawi faceted classification scheme is a working prototype of a semantic collaborative tagging tool conceived for bookmarking information architecture resources. It aims to show how the flat keywords space of user-generated tags can be effectively mixed with a richer faceted classification scheme to improve the system information architecture.

Facets constitute an adaptive classification system capable to represent both a knowledge in movement (like that of collaborative environments) and several mental models at the same time. The blend of tags and facets is able to empower the information scent and berrypicking capabilities of the system.

It provides for a much more flexible switching logic. Just like facets, when making multi-criteria search it offers several simultaneously accessible menus, each corresponding to a criterion relevant at current specialization stage. As properties are further specialized, the menus may be used in any order. But unlike facets, our classification scheme automatically recognizes what criteria are relevant at each specialization stage, i.e. for a given combination of properties. It is very important that the scheme intrinsically supports switching logic of unlimited complexity while with facets one has to resort to additional descriptions and constructions such as meta-facets.

In a non-automated Lawi classsification scheme, the sequence of specializations starts from a single menu asking about the issue’s broad category. For example, once «International Law» is selected, new relevant search criteria «Author», «Language», and «Jurisdiction» appear. Any of these three menus may be used for further specification. Once a new taxonomy in Language is selected, due to the choice, just made two new criteria, for example, may become relevant. They are shown together with two previous, still relevant and unused criteria «Author» and «Jurisdiction». These four menus may now be used in any order to further specify properties. Specialization may continue until relevant criteria are exhausted which means complete description of item properties.

In the Lawi automated knowledge representation system, properties do not have to be entered manually as in example above. Instead, they can be automatically generated and processed by the system in complex combinations.

The idea of automatic recognition and simultaneous presentation of all relevant criteria is not new by itself. Many developers of classifications, knowledge representation systems, databases and other information-managing applications would attempt to implement this logic to maximum extent. However, all present-day approaches are based on programming support of switching logic, i.e. they fail to ease the extensions and modifications of the scheme.

The main advantage of our knowledge representation technique is that switching logic of unlimited complexity is implemented by using simple and uniform structures. Configuration of database and volume of managing computer code do not depend on the number of search criteria and selection options, or on the complexity of conditions determining their relevance and consistency. Thus, our information architecture allows refining data systematization and extending the diversity of classified objects without additional expense for development of managing software or descriptions. That is why it dramatically simplifies development and maintenance of information-managing software and allows creation of more sophisticated intelligent knowledge representation systems than by using other approaches.

Compared to tres, theLawi classification scheme:

◾a category may introduce several rather than one new relevant criterion, i.e. a statement describing that category may be further specialized by an assertion from any of several simultaneously applicable new criteria;
◾a category may have several parents, i.e. the statement describing a category may be obtained by several sequences of specializations (“paths”) rather than one;
◾category describing statements may be formed from primary assertions using not only logical AND, like in trees and facets, but also by complex combinations of AND, OR, NOT, i.e. intersections, unions and negations.

The system of classification criteria (answers, menus) is chosen in such a way that the domain of applicability of each criterion is either
1.the entire set of classified objects (the “universe”), or
2.a category described by combination of primary assertions from more general criteria of the same classification system (combination of answers to more general questions, selections from upper-level menus).

The generating hierarchy implicitly describes the full set of classification categories as the set of all valid statements allowed in this grammar. Therefore, it becomes unnecessary to explicitly list categories unlike in trees and facets (within individual facets). This makes the whole classification structure very simple and concise. When designing a database for a particular application, the designer has to define only the grammar of data systematization while avoiding a host of secondary problems: exactly what categories are required, how to rank by importance unrelated criteria, how to implement relationships and meta-descriptions, and more.

It requires storage of only a tiny fraction of categories, while forming all other necessary categories dynamically in run time.

Instead of saying that any given taxonomy “is” or “is not” the same as another taxonomy, the Lawi Project is able to recommend related taxonomies by saying “A lot of people who tagged this ‘Labor Law’ also tagged it ‘Labour Law’.” The Lawi Project tries to move from a binary choice between saying two taxonomies are the same or different to the Venn diagram option of “kind of is/somewhat is/sort of is/overlaps to this degree”.

Some facets will be user-generated with the exception of the language facet, which will use a predefined list of languages in the ISO 639-2 notation.

Language and publication date are actual facets, but are primarily used as simple filtering tools because of their special, flat nature, since they cannot currently be part of a tag hierarchy.

Hierarchical Classification Scheme

A hierarchical system requires only one overarching organizing principle, whereas a faceted classification scheme uses a number of separate hierarchies (facets) concurrently. A faceted system can provide more information because each item is defined by multiple characteristics.

Although faceted systems are often more voluminous, they also facilitate discovery of gaps in the system— an advantage when designers strive to capture all possible variations of the classified items.

Under the hierarchical classification system, the same code may be assigned, for example, to 3 related workers. Under a faceted system with three facets (work performed, educational level, and industry) each of these three workers would have a different code.

Hierarchical systems can be either polyhierarchical or mono-hierarchical, depending on whether or not the classified items appear in multiple locations. In a polyhierarchical system, content is reused, therefore an occupation can be found in multiple places. For example, an occupation requiring expertise in both business and sales (such as fundraising) can be found in two places in the structure, under both sales and business. Further, if an individual’s occupation was “fundraising manager,” that category could be found in three locations: sales, business, and management.

Summary
Classification systems must evolve in order to facilitate the collection of meaningful data and information