|
> Homepage >
The Journal >
Issues Contents >
Vol. 10 (2006) >
Paper 2 |
|
VOLUME 10 (2006): ISSUE 1. PAPER 2Studying the Scholarly web: How disciplinary
culture shapes online representations Jenny Fry
The aim of this paper is to relate traces of the fundamental cultural characteristics of intellectual fields to representations of research activities on the scholarly web with a particular emphasis on qualitative hyperlink analysis. It will do this by asking whether Whitley's (1984) theory of the intellectual and social organization of the sciences, popular within science and technology studies, can be operationalized in the context of the web. The data is gathered and analyzed based upon the notion of a websphere (Schneider and Foot, 2002) embedded within the wider web presence of two intellectual fields: corpus-based linguistics and argumentation theory. These two case studies share the same parent discipline of linguistics, however, the goals, cultural identity and web presence of the scholarly communities that inhabit them differ in many ways. The embedded webspheres were constructed based upon interview data, MicroSoft Site Analyst and search engine data. The findings show that the production of the web within corpus-based linguistics and argumentation theory has quite different and distinct characteristics. Within corpus-based linguistics we observe locally produced de-centralized webspheres, local in the sense that these pages and sites are organized on a project level, rather than at that of the field, and de-centralized in that the organizing imperative of the websphere runs along the dimension of 'functional dependence' rather than 'strategic dependence'. This means that there is less need to demonstrate the significance of the research problems, techniques or outcomes to the science system at large. In argumentation theory, on the other hand, production of the web is centralized at the field level, with pages and sites being organized on the basis of professional activities of research schools. The organizing imperative is centralized based along the dimension of 'strategic dependence', where agenda setting activities and reputation buildings strategies are prominent in the websphere. The findings also have consequences for the application of the websphere concept in the context of studying the scholarly web. KeywordsLink analysis, websphere, virtual methods, scholarly communication, research practice, Google, case study, qualitative research 1 Introduction The aim of this paper is to relate traces of the fundamental cultural characteristics of specialist fields to representations of research activities on the scholarly web with a particular emphasis on qualitative hyperlink analysis1. It will do this by asking whether Whitley's (1984) theory of the intellectual and social organization of the sciences, popular within science and technology studies, can be operationalized in the context of the web. Whitley's theory is based on two central concepts: the degree of 'mutual dependence' between scientists and scientific fields in making competent and significant contributions to the body of knowledge; and the degree of 'task uncertainty' in the coordination and interpretation of research techniques and outcomes. The two concepts are interrelated and relative, which is a challenge to applying them empirically because they cannot be measured in absolute terms. By exploring the 'offline' and 'online' behavior of specialist communities the paper facilitates the development of an explanatory framework for understanding the intellectual and social phenomena that underlie linking patterns in the scholarly web. As Harries et al. (2004) and Thelwall (2005) argue, despite the current interest in linking patterns and many statistical results indicating that web links can be associated significantly with scientific prominance or productivity no causative connection has been claimed and only tentative attempts have been made at providing a theoretical framework with which to make sense of the results in a meaningful way. Harries et al. (2004) urge a response to this situation given that the lack of understanding of why web links are created is an obstacle to the newly emerging field of webometrics. Thelwall (2005) points out the limitations of large-scale link analysis in answering questions about the social motivations underlying link creation and advocates triangulation of quantitative and qualitative methods. This paper attempts to address the current social theory lacunae in webometrics by uniquely combining techniques from virtual ethnography (Hine, 2000; Marcus, 1995) to augment the websphere concept developed by Schneider and Foot (2002). Schneider and Foot developed the websphere concept in a number of non-academic contexts, in particular the study of online political campaigns. They define a websphere "as a collection of dynamically defined digital resources spanning multiple websites deemed relevant or related to a central theme or object". In this study the 'central theme' has been the epistemic boundaries of two specialist fields of intellectual enquiry. The specialist fields were argumentation theory an interdisciplinary area located at the interstices between philosophy, linguistics and communication studies; and corpus-based linguistics, which is also an interdisciplinary field combining areas from computational linguistics, such as natural language processing with traditional linguistics fields such as socio-linguistics, contrastive linguistics and morphology. Corpus-based linguistics is characterized by a high degree of collaborative work, which is project based often at the national level and its outcomes are oriented towards development of products such as corpora (Becher, 1989). The corpus-based linguistics case study was based on a major national corpus project in the Netherlands: The Corpus of Spoken Dutch (CGN). Argumentation theory, on the other hand, is characterized by research schools, rather than projects and is theory driven (Becher, 1989). Reflecting this, the argumentation theory case-study centres around what researchers in the field refer to as the 'Amsterdam School', which is a research group based in the Department of Speech Communication, Argumentation theory and Rhetoric at the University of Amsterdam. Though these two case studies share the same parent discipline of linguistics, the goals, cultural identity and web presence of the scholarly communities that inhabit them differ in many ways. The embedded webspheres were constructed based upon interviews with the two project managers of the CGN project and a member of the 'Amsterdam School', data generated using Microsoft Site Analyst and search engine data (see appendices I and II for the interview schedules). The findings point towards potentially scalable qualitative indicators that capture what Cronin et al. (1998) describe as liminal traces of peer esteem, influence over reputation and resources, and legitimization of research problems, approaches and techniques. 2 Studying scholarly research practices on the web Studies of scholarly communication have used a range of granularity in their units of analysis. This reflects the production of knowledge in a multitude of overlapping social and institutional configurations e.g. research groups, projects, invisible colleges, specialist fields, laboratories, academic departments and disciplines, the boundaries of which are often blurred and not clearly delineated. Price (1963) and Crane (1972) used the notion of research oriented scientific communities that identify themselves around comparable research problems or methods. The concept of the scientific or disciplinary community is often defined as rather idyllic and consensus oriented, as such the concept has been criticized (Swales, 1998). Price (1963) coined the concept of the 'invisible college', which is based on a social network with a capacity membership of around two hundred members, and a core of around twenty leading research scholars. Beyond this capacity, Price argued that a research area would become saturated and groups will split off to form new specialist fields. This concept has been explored and developed by researchers in a number of disciplines. For example, Kochen (1983) has used estimation of the size of the research cutting edge in mathematical modeling in order to simulate the growth of specialties. Whereas, in sociology, Crane (1972) built on Price's notion of the 'invisible college' to show how knowledge is diffused through networks of scientific communities. In later work Bruckner, Ebeling and Scharnhorst (1989) used Price's (1963) theory to simulate instabilities in scientific evolutionary systems and extended Kochen's (1983) model. Moving towards a less idyllic notion of scientific communities where competition and division of labour are central, Bourdieu introduced the notion of fields of intellectual inquiry. In Bourdieu's (1988) definition fields are fluid entities based on the dynamics of competition, rather than consensus as portrayed by the concept of 'communities of practice' (Wengar, 1998) whereby cohesive networked communities compete for the same problems and resources. A more institutionalized perspective of scientific communities is the notion of the discipline. Scientific disciplines are inextricably linked to pedagogical and training institutions and represent market monopolies of certain types of knowledge (Whitley, 1984). Salter and Hearn (1996, p.23) capture many of the recurrent themes that unite attempts to define the characteristics of a discipline: a constellation of topics, perspective and methods; dominance of a prevailing approach, with critical perspectives; institutional recognition in the form of departments, journals and conferences; a self-proclaimed community of scholars; and methods of inculcating or compelling adherence to the discipline's culture. The concept of an academic discipline is not altogether straightforward. Heilbron (2004) gives a historic account of how disciplines have a different history in different countries. Lenoir (1997) points out that most novel research is not confined within the scope of a single discipline, but draws on the work of several disciplines. Disciplines are only partly identified by the existence of relevant academic departments; and it does not follow that every department represents a discipline (Becher, 1989). Within webometrics, hyperlinking studies have tended to conflate academic departments with cognate research areas. For example, Heimeriks, Hörlesberger, and Van den Besselaar (2003) studied communication and collaboration in university-industry-government relations and used the home pages of academic departments as representations of mode 2 knowledge fields (Gibbons et al, 1994), however, only one of the case study departments actually represented cognate research activities. Such conflations can lead to disappointing results in hyperlink studies and misinterpretation of the interrelationships under observation. In order to capture online representations of the cultures of knowledge production this paper focuses on a lower level of aggregation than academic departments choosing instead research projects and research groups as embedded case-studies within intellectual fields. Although Whitley (1984) based his theory on the comparison of disciplines, Fry (2006) has shown that it can be used to explore research practices within specialist fields, as the concepts of 'mutual dependence' and 'task uncertainty' support iteration between the internal organization (division of labour and competition) of specialist communities and the wider scientific environment of the disciplines themselves and the science system at large. For example, the strategic position of a specialist field within its parent discipline can be a major factor in determining its autonomy, coherence and research direction. Depending on the level of granularity, e.g. discipline or specialist field, scientific communities can differ in their relative degree of 'mutual dependence' or 'task uncertainty', as some specialist fields may be more or less coordinated, integrated and standardized than their parent discipline. Studies of scholarly communication within digital networks tend to privilege analysis of either the social over the cognitive or vice versa. For example, Kling and McKim (2000) focused on the influence that social structures such as recognition and reward have on shaping web based scholarly publishing and did not take characteristics of specific knowledge structures into account. Due to their interrelated character Whitley's two dimensions enable both elements of scholarly work to be treated as analytically valid. They are further divided into four overlapping analytic elements that relate either to reputational control and coordination of research strategies and intellectual priorities, 'strategic dependence' and 'strategic uncertainty', or to the coordination of competence standards, research techniques and task outcomes, 'functional dependence' and 'technical uncertainty'. Though this was a study of phenomenon on the web it was important to use the broader offline canvas of the scholarly cultures of each case study as context to interpret the traces unraveled through the online data gathering and analysis. As Wouters and de Vries (2004) argue, studies that focus solely on the online characteristics of scholarly communication tend to overlook intellectual, social and institutional phenomena. Internet research has illustrated that boundaries between physical worlds and virtual worlds are often blurred (Hine, 2000; Haythornthwaite and Wellman, 1998), therefore implying a distinction between offline and online worlds is problematic. The concept of an 'off-line' world is used here with caution. Its main purpose is to indicate that time was spent exploring the traditional distributed spaces (Marcus, 1995) of communication, such as grey literature, conferences and journal articles in each of the two case-studies. 3 Developing a websphere Due to the qualitative augmentation of the websphere concept, the methodology did not aim at capturing a comprehensive websphere for corpus-based linguistics and argumentation theory at the field level. Hence, an embedded case-study approach was developed based on divisions of labour typical in each field, which then formed the basis for an ego-centric websphere. Both these case-studies were chosen on the basis of accessibility and off-line prominence of members. The URL that collectively represented each case-study on an institutional level e.g. university server, was taken as the central base URL in each websphere (see the first URL in each websphere as shown in appendices III and IV). The websphere concept is premised on the production of the web from a producer's perspective. Given that this was a study of research integration, coordination and peer esteem it was necessary to extend the notion of production in the context of scholarly webspheres to take interconnectedness between multiple webspheres into consideration. Therefore in-links to each central base URL were also included in the data gathering and analysis. The different approaches to producing the scholarly web in each of the two-case studies necessitated a different approach to the construction of a websphere for each. For example, a key difference between the CGN project and the 'Amsterdam School' was the 'direct' production of a websphere within the CGN case study through a central URL (http://lands.let.ru.nl/cgn) hosted by the Catholic University Nijmegen, with the content produced by the project manager as the 'official' web representation of the project. The Google.nl search engine was then used to identify inlinks accumulatively over time. Thus, the 'official' CGN project URL formed the centre of the websphere. This contrasts with the predominately 'indirect' production of a websphere by the 'Amsterdam School' who initially declined the invitation to participate in the study on the basis that they did not perceive themselves to be producing or using the web. The institutional homepages of the group members hosted by the University of Amsterdam (http://cf.hum.uva.nl/data/afd/neerlandistiek/tar-english/) contained minimal content relating to the research activities of the 'Amsterdam School'. This base URL was initially selected as the central node for the websphere, but it contained only one outlink and was not inlinked by any other pages, as such it was not dynamically linked to a network of external URLs. Therefore a substitute for the central base URL in the websphere for the 'Amsterdam School' that appeared more likely to represent the group's research presence on the web was sought. The head of the 'Amsterdam School' and other members established the International Society for the Study of Argumenation (ISSA) in 1986 following the first international conference on argumentation theory, which was held at the University of Amsterdam. This information was gained during interview and, subsequently the URL for the ISSA website ((http://cf.hum.uva.nl/issa/) was used as the central base URL in the websphere. In addition to the annual conference, the 'Amsterdam School' also established an international journal, Argumentation: An international journal on reasoning (http://www.springerlink.com/openurl.asp?genre=journal&issn=0920-427X) 2 and this also plays an important role in the wider web presence of the 'Amsterdam School'. The websites of ISSA and the international journal edited by the head of the 'Amsterdam School' differ from that of the 'official' CGN project website in that they have not been produced by members of the 'Amsterdam School' as a direct representation of the groups research activities, but rather they represent professional and publishing activities. By constructing an ethnographically informed websphere, as well as a functional one (constrained to outlinks from the central URL), it was possible to trace representations of the 'Amsterdam School' even though its members perceived the group as a non-web user or producer. 3 For the purposes of this project I defined a websphere and the webpresence within which it is embedded as including the following types of digital inscriptions and actions:
Defined as such, each websphere was ethnographically augmented, longitudinally constructed and validated by sending the list of URLs constituting each initial websphere to selected members of the case studies. Initial base URLs were confirmed through interviews with key members of each case study. Microsoft SiteAnalyst was then used to identify the outlinked pages from each central URL. Inlinks were identified with periodic in-link searches using the Google in-link search facility, approximately six times over a 12 month period, the results were cleaned up in order to remove self-linking and onerous URLs 5. This revealed multiple sites for the CGN project hosted by the various project partners located at universities across the Netherlands and Flanders, but all linked to the 'official' web site hosted in the Netherlands, and a password protected intranet for the project hosted at the University of Utrecht. It also identified two reciprocal links between the CGN project and the Netherlands National Science Foundation (NWO), the co-funder of the project, and the language union in the Netherlands (Taalunie), the ultimate owner of the corpus. The wider Google.nl (Google Netherlands) web presence for each case-study was also part of the analysis. This was ascertained using exact phrase searching to find URLs that were conceptually related in terms of content, but not linked via hypertext. The web presence provided a field-based context for each embedded case-study. As such it provided a representation of the potential websphere for each case study, which in both cases was much larger than the actual websphere. For example, the exact phrase "corpus gesproken nederlands" (Corpus of Spoken Dutch being the English translation) was searched on Google.nl and retrived approximately 900 URLs, whereas the number of URLs in the CGN websphere total sixty three. The exact phrase "pragma dialectics" was searched in place of the 'Amsterdam School', which is a colloquial term for the group, as this was their niche research area. A search for "pragma dialectics" on Google.nl retrieved 860 URLs (please see appendices V and VI for a list of the first thirty results in each search) compared to 28 research related URLs in the websphere. These searches were also conducted using the AltaVista search engine in case of limitations in coverage. In terms of comparison the corpus-based linguistics case was based on a project web sphere, which reflects the pattern of project-based web sites across the field of corpus-based linguistics, while the argumentation theory case was based on a research school web sphere, which also reflects the wider web presence. 4 Measuring mutual dependence and task uncertainty The relative degrees of 'mutual dependence' and 'task uncertainty' characteristic in the cultural identity of each case study was assessed through understanding gained during in-depth interviews with two participants from the CGN project and one participant from the 'Amsterdam School' (see appendices I and II for an outline of the interview questions) and a discourse analysis of a limited amount of literature produce by case study participants. Dimensions of 'mutual dependence' and 'task uncertainty' upon which the case studies were assessed included size of national and international community, degree of control over major channels of dissemination, predominant genres of communication, strategic position within parent discipline, presence of a lay audience, pedagogical concerns e.g is field part of an undergraduate education programme, and role of information gatekeepers such as publishers. Table 1 adapted from Whitley (1984) illustrates the relative degree of 'mutual dependence' across corpus-based linguistics and argumentation theory based on what was discovered about the intellectual and social organization at the field level of each-case study: Table 1. Relative degrees of 'functional' and 'strategic dependence' between corpus-based linguistics and argumentation theory
Whitley argues that the consequence of an increasing degree of 'functional dependence' is a high degree of specialization and differentiation of problems and goals with the development of specific procedures to deal with them. A high degree of interdependence within a specialist field leads to considerable coordination of results and specialist areas. As the degree of 'strategic dependence' increases, so does the coordination of problems, goals and procedures. Competition also increases as scientists from different sub fields compete to convine colleagues of the centrality of their problems and goals to the science system. The offline world of Corpus-based linguistics indicates that the relative degree of 'functional dependence' is higher than in argumentation theory. This is probably in no small part due to the fact that its goals are product oriented, which requires the integration of techniques. This runs counter to the intellectual pluralism characteristic of theory oriented fields such as argumentation theory. By and large the disciplinary overlap between the various areas of intellectual endeavour that converge within the field, such as computational linguistics, socio-linguistics, phonology and lexicography has become taken for granted, though not without controversy (Sampson, 2003; Lawler and Dry, 1998; and Borsley and Ingham; 2002). Correspondingly, while in argumentation theory integrative rhetorical moves tend to use conciliatory phrasing such as 'comply' and 'towards', corpus-based linguistics texts tend to employ war like metaphors such as 'won' and 'battle' when reflecting on the field's position within the parent discipline (Sampson, 2003). This suggests that while the relative degree of 'functional dependence' is higher in corpus-based linguistics the relative degree of 'strategic dependence' is lower than in argumentation theory. There are some intellectual and social frontiers around corpus-based linguistics that are more explicitly contentious and fiercely contested than others. The interface with fields in theoretical linguistics is the most controversial and visible in the general lingusitics literature 6 . In the wider context of its parent discipline levels of 'strategic dependence' have not yet been secured and can probably be considered to be much lower than a well-established field such as high-energy physics. Since the late 1990s the literature produced by members of the 'Amsterdam School' has contained the rhetoric of integration of the two main opposing paradigms (see Van Eemeren and Houtlosser, 2001) for an examplar of this). For the purpose of establishing the degree of 'mutual dependence' in this field the move towards consensus in the literature has been interpreted as indication that levels of 'functional dependence' are relatively low, but that the 'Amsterdam School' and its allies are working towards increasing the degree of 'strategic dependence', in particular, between groups, organizations and institutions that represent the various schools of thought within argumentation theory. This is necessary in order to harness resources such as reputation, audience and personnel and advance the fledgling research programme towards institutionalization and survival as a discipline. Whitley's observations of the cultural outcomes of an increasing degree of 'strategic dependence' have further resonance within the argumentation theory case study. In the early 1990s members of the 'Amsterdam School' produced a number of monographs (Van Eemeren and Grootendorst, 1994) proposing an interdisciplinary research programme within argumentation theory, with the pragma-dialectic approach at its core. The creation of ISSA (International Society for the Study of Argumentation) in 1986 can also be perceived as functioning to place the 'Amsterdam School', and therefore the study of argumentation theory in the Netherlands, in a pivotal position within the international argumentation theory community. ISSA has also established an awards programme further augmenting the 'Amsterdam School's' strategic positioning in terms of reputational control within the wider field. Funding of a research nucleus on "Fallacies as violations of rules for argumentative discourse" by the Netherlands Institute for Advanced Study in the Humanities and Social Sciences during the academic year 1989-1990 further endorses the impression of a close-knit international network around the 'Amsterdam School' . The argumentation experts that were part of that nucleus include the chair of the Department of Speech Communication, Argumentation Theory and Rhetoric and other current prominent members of the dialectic approach to argumentation theory many of whom were actually in the department at the time. Table 2. The relative degrees of 'technical uncretainty' and 'strategic uncertainty' between the two case-studies
Whitley argues that the consequence of an increasing degree of 'technical uncertainty' is that the work organization of intellectual communities is characterized by heavy dependence on direct and personal control of work leading to considerable variation in working practices and lack of development of research goals at the international level. 'Strategic uncertainty' encompasses uncertainty about intellectual priorities, the significance of research topics and preferred ways of dealing with them, the likely reputational pay-off of different research strategies, and the relevance of task outcomes for collective intellectual goals (Whitley, 1984). The consequence of an increasing degree of 'strategic dependence' is a greater reliance upon a particular group of colleagues for reputations and access to materials, therefore it will be necessary to reach consensus over intellectual priorities and a hierarchy of research topics. The degree of 'strategic dependence' within corpus-based linguistics is mitigated by the relatively high degree of 'technical uncertainty'. Tasks typically involved in building a corpus include corpus design, recording, transcription, tagging, annotation and the development of exploitation software. There are many decisions to be made about the appropriate approach to each of these tasks and there are many debates in the literature about the best approach to adopt. For example, there are many possible levels of annotation of empirical data, such as syntactic, prosodic, or phonetic, each one offering different analytic possibilities to unknown future users. Many tasks necessitate the development of technical standards. Atwell et al, (2000) conducted a survey of the use of parsing standards within the field and found that there was much local variation in the use of parsing schemes. The move towards creation of a greater degree of 'strategic dependence' within the argumentation theory community means that simultaneously the degree of 'strategic uncertainty' will decrease. 5 Main contrasts in the content characteristics of the two webspheres A key contrast between the webspheres of the CGN project and the 'Amsterdam School' is in size in terms of: number of URLs; the content aboutness of URLs; destination of outlinks from each central URL in terms of national (identified by the country code Top Level Domain name (ccTLD 7)), organizational (recognized in part by the Top Level Domain name (TLD 8)) and content characteristics; and origin of inlinking URLs to the central URL in terms of national, organizational and content characteristics (the list of URLs that constitute each of the two webspheres are shown in appendices III and IV). 5.1 Characteristics of the directly and indirectly produced CGN websphere Within the 'official' CGN website, as delineated by the URL http://lands.let.ru.nl/cgn 9, there were 55 pages containing a detailed description of the project, justification for the project design, data samples, bibliographic references, technical and work-in-progress reports. Documentation was available in a range of files including PDF, PPT and Postscript. Table 3 summarizes the main dimensions of the CGN websphere: Table 3. Summary of the CGN websphere
There were eight outlinks from the site and thirty-nine inlinks (excluding within site links). Of the eight outlinks 4 were within the .nl (the Netherlands) ccTLD, they were:
The four remaining outlinks were to ccTLD names within Europe:
During a 12 month period between August 2003 and July 2004 thirty-nine inlinking URLs to the CGN project URL were identified using the in-link search facility on Google.nl. Twenty of the inlinking URLs were from the .nl (The Netherlands) ccTLD (see appendix III for a full list of inlinking URLs and page titles). These URLs include sister pages to the 'official' CGN page produced by the other partner institutions in the project, departmental sites of participating partners, pages produced by users of the corpus, former project research assistants, the Department of Speech and Language at the Catholic University Nijmegen where the project manager for the Dutch segment of the corpus is based, annotated lists of links that direct people to language technology resources, research workshops, the Netherlands language union (Taalunie), and the Netherlands science foundation (NWO). Of the nineteen in-linking URLs from outside the .nl ccTLD name, nine were from the .be (Belgium) ccTLD. The URLs from the .be domain partially represent the participation of Belgium based partners in the project responsible for the Flemish component of the project. The URLs for the main departments and people based in Belgium involved in the development and use of the corpus, however, are not represented in the directly produced websphere. These URLs were added to the indirectly produced extended websphere by the project manager of the Flemish component of the CGN project during the validation round of the websphere construction. Five of the inlinking URLs to the CGN URL were from the TLD name .org and were hosted by two Dutch organizations: dbnl (the digital library for Dutch literature and language) (1 link); and the national language union (Taalunie) (3 links). The fifth inlink from .org originated from the organization for European based Human Language Technologies (HLT). The five remaining inlinking URLs were also hosted within Europe. Two from the .de (Germany) ccTLD name: one from a page about Netherlands philology hosted by the Free University of Berlin and containing content co-written by the CGN project manager of the Netherlands component of the project and describing data in the CGN; the other hosted by the University of Leipzig centre for the study of the Dutch language. There was one inlink from the .dk domain representing the Danish Dependency Treebank website, treebanks being a technique used within the CGN project. 5.2 Characteristics of the directly and indirectly produced 'Amsterdam School' websphere During the period of the study the central URL in the 'Amsterdam School' websphere contained only three pages (a home page describing the aims of ISSA, a page containing details of the annual conference, and a page listing recipients of the ISSA award) and two outlinks. One outlink was to an associate of the 'Amsterdam School' based in the department of communication at the University of Louisville (.edu) and the other was to the journal Argumentation: An international journal on reasoning edited by the head of the 'Amsterdam School' and published by Kluwer, which is based in the Netherlands (.nl) 10. Despite this lack of content in terms of research or teaching related information, features and document files, the ISSA URL received seventeen inlinks. In general, low inlink metrics have been reported for academic webpages (Thelwall and Harries, 2004) it is surprising, therefore, that the number of inlinks to the ISSA URL exceeded the number of pages within it. Table 4 summarizes the main dimensions of the 'Amsterdam School' websphere: Table 4. summary of the 'Amsterdam School' websphere
In contrast to the national orientation of the CGN websphere, sixteen of the inlinks in the 'Amsterdam School' websphere were from outside the .nl domain and international in nature. The single inlink from the .nl domain was on the website of a member of the informal logic community in the Netherlands who is active on the ARGTHRY discussion list (the main listserv discussion list for the argumentation theory community) and has produced a substantial personal website with pages dedicated to the study of argumentation theory. The content of the page origin of the inlink is about the 9th International Workshop on Non-Monotonic Reasoning and the link referencing the 2002 ISSA conference is embedded in a list of related links. Other subject resource pages from outside the .nl domain that inlinked to the ISSA website were: "Philosophy resources on the internet: EpistemeLinks" (.com), which is a commercial site selling products related to philosophy; "Rhetoric resources on the web" (.edu), which is part of the home pages of a faculty member in the Department of English, University of Wisconsin-Madison; "Tim Gelder's Critical Thinking on the Web: Institutes, Centers and Societies" (.org); "The reasoning Page", which is an annotated list of online resources in the study of argumentation (.edu). The Vale Press (.com) website were producers of three inlinks to the ISSA URL. Vale Press is a specialist publisher in the field of critical thinking. They publish the SIC SAT series on argumentation theory, which has been a forum for the publication of monographs written by members of the 'Amsterdam School'. Outlinks to the ISSA website appeared twice on the Vale Press home page: in the calendar frame with reference to the annual ISSA conference and in the popular links frame. These links were both embedded in close proximity to two out links to the Association for Informal Logic and Critical Thinking (AILACT), which have an outlink on their own URL to the ISSA journal Argumentation: An international journal on reasoning. The third outlink to the ISSA URL on the Vale Press website was embedded in a list of monograph titles. There were five inlinking URLs from the .edu domain (two of which have been described above). One of these was from the website of the Stanford Encyclopeadia of philosophy under the entry for "Informal Logic". The remaining two inlinking URLs from the .edu domain are from the personal pages of faculty in Departments of Rhetoric and English or Communication and are listed under useful resources and communication related associations and conferences respectively. There were two inlinking URLs from the .ca (Canada) domain. One was a reference from the website of the Journal of Informal Logic: Reasoning and Argumentation in Theory and Practice under a list of links to other journals and resources, which also contains an outlink to the URL of the ISSA journal. The second inlink from the .ca domain was from the Wetaskiwin Telephone company, which has a link to a paper that was published in the Proceedings of the Fifth Conference of ISSA, Amsterdam, June 25-28, 2002. A member of the 'Amsterdam School' edits an online journal entitled Argumentation, Interpretation and Rhetoric, which is published in Russian and English. The Netherlands Institute in Saint-Petersburg is the connection between the 'Amsterdam School' and Russia and the inlinking URL from the .ru (Russia) domain was from a page in the archive referencing the journal Argumentation, Interpretation and Rhetoric. A former Masters student at the Department of Speech Communication, Argumentation Theory and Rhetoric had produced an inlink from his personal pages in the .de (Germany) domain. From the .uk (United Kingdom) domain there was an inlink from the webpages of an ECIA 2002 (European Conference on Artificial Intelligence) pre-conference workshop on computational models of natural argumentation. The remaining inlink is from the .ch (unlisted in the ISO 3166 country Code list) domain and appears to have only a trivial relationship to the activities of the 'Amsterdam School', as the inlink originates from the pages of the Center for International Health and Cooperation and listed under a diverse range of industrial and scholarly associations. The URLs added by the case-study participant from the 'Amsterdam School' are all about the key workshops and conferences in argumentation theory and where members of the 'Amsterdam School' have participated. Given that the Kluwer webpages for the ISSA journal Argumentation: An international journal on reasoning were closely related to the ISSA webpages in that there is a reciprocal link between them and gatekeepers such as publishers play a significant role in the websphere of the 'Amsterdam School' an in-link search was also conducted on the journal URL (http://www.kluweronline.com/issn/0920-427X) using Google.nl (see appendix IV for a list of the inlinking URLs). The twenty three inlinks to the ISSA journal, all of which are from outside the .nl domain, are not discussed in detail, but provide some wider context and illustrate the different websphere that is constructed when different representations of a scholarly community are used as the unit of analysis. Figure 1 summarizes some of the main dimensions of the web spheres of each case study: Figure 1. Comparison of key dimensions of the CGN (Corpus of Spoken Dutch) and 'Amsterdam School' webspheres 5.3 Webpresence: A potential websphere The web presence of each case-study as represented on Google.nl provided a field-based context for each embedded case-study. As such it provided a picture of the potential websphere for each case study, in terms of number of URLs and predominant genres, which in both cases was much larger than the actual websphere. For example, the exact phrase "corpus gesproken nederlands" was searched on Google.nl and retrived approximately 900 URLs, whereas the number of URLs in the CGN websphere totalled sixty three. The exact phrase "pragma dialectics" was searched in place of the 'Amsterdam School', which is a colloquial term for the group, as this is their niche research area and this retrieved 860 hits compared to 26 URLs in the websphere. This contrast does raise a question about the extent to which the websphere concept is meaningful for studying the production of the scholarly web. The most recent searches were run on 1st May 2005 and the first 30 hits are shown in Appendices V and VI. There are some issues of comparability between the web representations of the two fields using commercially available search engines due to possible bias in coverage along national or linguistic dimensions (Vaughan and Thelwall, 2004; Bar-Ilan, 2005) and the problems associated with multilingual text querying (Moukdad, 2002). The results for the CGN project are different when the English translation of the phrase "Corpus of Spoken Dutch" is searched, but given the low representation of English in the CGN websphere it was felt more relevant to use the Dutch instantiation. In this crude way, however, we can observe that the number of URLs that contain references in their page content of either the CGN project or members of the 'Amsterdam School' is higher than those that actually contain an outlink to the main URLs for each case study. This highlights the limitation of relying on hyperlinks alone as representing interconnectedness between actors. Manual sampling of the first 30 and last 10 results in each search set was done in order to check that hits were relevant. In the manual sampling the 'aboutness' of each URL was assessed based on summary information contained in the URL structure, e.g. characteristics of producing institution, page title, and the content of the pages themselves where the URL metadata did not provide sufficient information on which to judge aboutness. In the "corpus gesproken nederlands" (CGN) web presence annotated subject lists linking to resources such as corpora, tools, people, organizations, working papers, reports and teaching materials dominated the search results. Bookstores and digital libraries representing publications produced within corpus-based linguistics appear only after the first sixty results. This contrasts with the results of the "pragma dialectics" searches where digital surrogates of monographs dominated the URLs retrieved. The aboutness of the URLs within the "argumentation theory" search results were oriented towards monograph surrogates as represented on the web pages of publishers, electronic bookstores and on-line libraries. These included those written or edited by members of the 'Amsterdam School'. 6 Traces of 'mutual dependence' and 'task uncertainty' in web representations of intellectual fields It is clear that the production of the web within corpus-based linguistics and argumentation theory has quite different and distinct characteristics. Within the CGN case study we observe local de-centralized production of webspheres, local in the sense that these pages and sites are organized on a project level, rather than at that of the field, and de-centralized in that the organising imperative of the websphere runs along the dimension of 'functional dependence' rather than 'strategic dependence', consequently there is an orientation to a national research audience because the field is not dominated by a single approach or hierarchy of problems. This means that there is less need to demonstrate the significance of the research problems, techniques or outcomes to the science system at large. In argumentation theory, on the other hand, production of the web is oriented towards a field-based level, with pages and sites being organized around professional activities of research schools or communities representing a dominant paradigm. The organizing imperative is centralized based along the dimension of 'strategic dependence', agenda setting activities and reputation buildings strategies being prominent. For example, the websphere of the CGN project has a strong national orientation, with an almost negligible number of URLs originating from outside the .nl country domain. In contrast, the websphere of the 'Amsterdam School' has a strong international orientation, with the majority of the URLs originating from outside the .nl country domain. In terms of the offline scientific audience for the CGN project, research outcomes have been disseminated amongst international conferences and peer-reviewed journals and articles published by project members have been cited in ISI Web of Science indexed journals. Corpus-based linguistics is, however, dominated by English language research and hyperlinking patterns in the CGN websphere appear to reflect this. High profile English language corpora, such as the British National Corpus and the International Corpus of English, do not link to the CGN project URL from their websites. Even though documentation produced within the CGN project does discuss the British National Corpus in relation to reaching international standards in corpus development there are no outlinks to it, or any other national corpora. This could be another indication of a high degree of 'strategic uncertainty' around the CGN project in that it is not necessary for other actors in corpus-based linguistics to align themselves with the CGN project in order to gain either prominence or validity within the field. The converse can be observed in the websphere of the 'Amsterdam School' where the high inlink ratio compared to the limited number of pages and content (in terms of variety of file types a measure of content that has been used in webometrics) could demonstrate the group's political position in that other researchers are having to acknowledge the group in order to appear credible, rather than an informational link or 'sitation' (Rousseau, 1997). Thelwall (2005) has found that web pages that are the target of many links are disproportionately likely to be targeted by any new links created. This practice has resonance with the pattern of esteem creation that Merton (1968) likened to the 'Matthew effect'. Commanding a prominent web presence may be important for groups in an emergent field where they need to increase the degree of 'strategic dependence', as is the case with the 'Amsterdam School'. If, however, their intellectual goals are better served by formal communication as is often the case in theoretical areas, which necessitates lengthy explanation of concepts and arguments, then monographic style communications (Swales, 1998) tend to be the dominant mode of communication. This means that representations on the web may be dominated by gatekeepers of formal published communication, such as libraries and publishers, rather than research oriented informal communication. As reflected in the dominance of publishers and digital libraries in the 'Amsterdam School' websphere. It is interesting that even though the 'Amsterdam School' have not directly produced a hyperlinked websphere their strategic position in the offline world is reflected in an indirectly produced extended websphere as is shown by the high inlink metric to the ISSA web pages. That this site consists of only three pages that have research content (the remainder of the site is dedicated to local sightseeing and accommodation arrangements for the ISSA conference), with no outlinks to other research groups or organizations, and whose inlinks outnumber of the total number of pages and outlinks could demonstrate the group's political position in that other researchers are obliged to acknowledge alignment with the group in order to appear credible, rather than an informational link or 'sitation' (Rousseau, 1999). As with 'Amsterdam School's' websphere, the CGN project is also represented within secondary sources of information such as subject portals and news bulletins, though in more specialised fora than publisher's catalogues, relating specifically to the Dutch Language, such as EuroMap, rather than the wider field of corpus-based linguistics. Both fields have a general lay audience, but because of the specificity of the language in the CGN project and the technical and esoteric language of the field itself the lay audience is likely to be restricted. When asked what were the main audiences for research produced within the group an interviewee from the 'Amsterdam School' responded that in addition to scholars in the fields of logic, rationality, rhetoric, persuasion and dialectics, undergraduate and postgraduate students were also important, as were non-academics interested in popularized accounts of science. Informal communication is a dominant mode of communication in corpus-based linguistics where in the technical areas of the field such as speech technology the communication model is more akin to computational science than typical humanities fields. Much technical information is communicated in unpublished forms of communication such as work-in-progress reports, manuals and conference papers and traces of these are highly visible on the web. This is also shaped by the higher degree of 'functional dependence' compared to argumentation theory, which means that there is a greater need for coordination of techniques and results across groups, disciplinary and national boundaries. The prevalence of the .nl country code domain is surprising in view of this interdependence particularly as group members had published and delivered conference papers about the development of the corpus at international conferences and in international journals. Conversley, even though the British National Corpus is referenced in publications and papers produced in the context of the project there was no outlink to its webpages. Attention has been paid, however, to make representations of international standards prominent on the website, making alliances to standards such as EAGLES and PRAAT visible through outlinks and thus garnering authority through association with standardized practices. A corpus can be used as both tool and research object, for the developers of corpora such as the CGN the project presents a research object that will generate research questions, but for those that use the corpus it is a tool for answering questions. Perhaps, as the CGN becomes transformed into a tool for answering research questions the networks that form around it will place it more centrally on the international scene of corpus-based linguistics. It is also possible that its focus on the Dutch language will keep it firmly placed within a national network of actors, despite its novelty and potential technical interest as a rare example of a spoken corpus. In the websphere of the CGN project the research of individual project members is submerged, with processes and tasks in the collective development of the corpus being given prominence over individual contributions e.g the project leader and developer of the official CGN site has not included any out-links to project members' home pages, including her own. Though there are several in-links to the project's URL from current and former graduate students and research assistants there are no in-links from the pages of senior academics who have been involved in the project. The individual research activities of members of the 'Amsterdam School' are also submerged under a collective web presence. The teaching activities of 'Amsterdam School' members are more visible within both their embedded directly produced websphere and extended indirectly produced websphere, than their research activities. There are a number of possible explanations as to why there are only a limited number of webpages produced as research resources in the argumentation theory webpresence. Due to the high-degree of 'strategic dependence', and therefore increasing competitiveness, there may be disincentives for sharing information and data. Conversely, the lower degree of 'functional dependence' could mean that there is less of an imperative to share research resources. Relative degrees of 'mutual dependence' and 'task uncertainty' may both have a strong influence on size (in terms of number of URLs) of webspheres, because a decreasing degree of 'mutual dependence' (both functional and strategic) means that there is less of an imperative for the coordination and sharing of resources. This may be compounded by an increasing degree of 'task uncertainty', which makes it more difficult to coordinate research problems, techniques, and outcomes. These kinds of cultural differences make the development of standardized web indicators to measure productivity across fields problematic. These findings also show that when different units of analysis are used different levels of prominence and connectedness can be observed. In the 'Amsterdam School' taking the professional society as a base for developing a websphere does reveal traces of peer review, esteem, and power, whereas the construction of a websphere around their departmental pages would not. 7 Conclusion As this paper has shown, differences in what is made visible by fields on the web and what remains invisible is quite revealing. Whitley's (1984) theory has been shown to be illuminating in understanding these differences. For example, in the 'Amsterdam School' case study the high in-link metric to the ISSA website can be interpreted as an increasing degree of 'strategic dependence' and its associated demands on alliance and community building in the field, whereas the CGN project does not share the same centrally strategic position within its field, which would account for the relatively low in-link metric to the project's website. In terms of page content of URLs in each websphere, the higher degree of 'strategic uncertainty' within the CGN project, which follows a general trend in the field, is reflected in the use of justificatory language for the design of the corpus and the prominent way in which outlinks to web representations of international technical standards is embedded within that narrative of justification. Particular aspects of an intellectual field's cultural identity that might provide an important contribution towards a theoretical framework for the interpretation of linking patterns and a means by which to identify appropriate units of analysis are:
In addition to yielding interpretive power of social, intellectual and institutional phenomena (see also Beaulieu and Simakova this issue) ethnographic hyperlink studies are also valuable in identifying a more nuanced unit of analysis for studying field differences in the production and use of the web. There is a need, however, to scale-up ethnographically informed hyperlink studies so that the core pedagogical or research business of intellectual fields can be automatically identified and wider patterns of behaviour observed. Low academic link metrics identified in the literature, particularly in the humanities, may be the result of methodology, rather than any kind of signifier of deficiencies in the superstructure of universities (Harries et al, 2004) or institutional research activity. In order to avoid misinterpretation of results it is necessary at the methodological level to be clear about the unit analysis being employed. Such as avoiding the conflation of academic departments with disciplines or research oriented intellectual fields. For example, if a solely quantitative study of hyperlinking patterns in the Netherlands took academic departments as the central node, then the Amsterdam School's departments pages (http://cf.hum.uva.nl/data/afd/neerlandistiek/tar-english) would represent the group as an isolate (due to the absence of outlinking or inlinking URLs). In network theory the extent of interconnectivity between nodes (e.g. a url with many inlinks) is used to interpret social factors such as the extent to which a researcher or research group is critical to a scientific community, authorative as knowledge producers or high-performers in terms of scientific output. Used as a quantitative performance indicator, therefore, the hyperlinking pattern around the departmental representation of the 'Amsterdam School' would be a gross misrepresentation of the international status and strategic role of the group within argumentation theory. The findings presented in this paper demonstate that different methodological approaches to the study of hyperlinks and hyperlinking patterns reveal different representations of research areas and the scholarly communities that inhabit them. In this case it was the unique combination of ethnographic techniques e.g. interviews and discourse analysis together with the use of commercially available tools such as the Google search engine and Microsoft Site Analyst, both based on web crawling technology that enabled a particular augmentation of the websphere concept that made traces of the intellectual and social identity of academic specialist fields on the web more visible. In a larger-scale study than the one reported here it may not be possible to triangulate as many different kinds of techniques, therefore it is important to be aware and understand the limitations of any single approach when interpreting results. Appendices Acknowledgements This paper is based on research that I conducted in the Networked Research and Digital Information (NERDI) group at the Netherlands Institute for Scientific Information Services and was funded by the Royal Netherlands Academy for Arts and Sciences. I would like to thank my colleagues at NERDI (now the Virtual Knowledge Studio for the humanities and social sciences) for their invaluable support during this research and comments on earlier drafts. I would also like to thank the two reviewers Professor Peter van den Besselaar and Dr. Viv Cothey whose detailed comments led to an improved final version. Notes 1. See Thelwall (2005) for a typology of link analysis. 2. Inlink counts to online pages of print-based journals is complicated by the fact that some publishers, such as Kluwer, provide a number of linking options. 3. My initial request for their participation being declined on the basis of this self perception. 4. See Bjorneborn and Ingwersen (2004) for a typology of links. 5. It is possible to exclude self-referencing links in Alta Vista using the Boolean expression (see Rousseau 1997 for a discussion of using Alta Vista's advanced search facility for in-link analysis) and during the course of this study Google appears to have reconfigured its inlink algorithm and now seems to automatically exclude self-referential links from search results. 6. See Borsley, R.D. and R. Ingham (2002), 'Grow your own linguistics? On some applied linguists' views of the subject', Lingua 112, 1-6, which is an exemplar of the kind of critique that corpus-based linguistics comes under from this direction. 7. The list of country code top level domain names is available from <http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html> 8. Further information regarding top level domain names is available from <http://www.dns.net/dnsrd/tld.html> 10. There were a further 36 outlinks from the ISSA pages that were not related to research, but to tourist information in Amsterdam for participants attending the annual ISSA conference. References Atwell, E., Demetriou, G., Hughes, J., Schiffrin, A., Souter, C., and Wilcock, S. (2000) A comparative evaluation of modern English corpus grammatical annotation schemes. ICAME Journal24:7-23. Bar-Ilan, J. (2005) How do search engines respond to some non-English queries? Journal of Information Science, 31 (1): 13-28. Beaulieu, A., and Simakova, E. (2005) Textured connectivity: an ethnographic approach to understanding the timescape of hyperlinks. Cybermetrics (this issue). Becher, T. (1989) Academic Tribes and Territories: Intellectual Enquiry and the Cultures of Disciplines. Buckinghamshire: Open University Press. Borsley, R.D. and R. Ingham (2002), Grow your own linguistics? On some applied linguists' views of the subject. Lingua, 112 (1):1-6. Bourdieu, P. (1988) Homo Academicus, trans. Peter Collier. Stanford, CA: StanfordUniversity Press. Bruckner, E., Ebeling, W., and Scharnhorst, A. (1989) Stochastic dynamics of instabilities in evolutionary systems. Systems Dynamics Review 2, 176-191. Crane, D. (1972) Invisible colleges: Diffusion of knowledge in scientific communities.London: The University of Chicago Press. Cronin, B., Snyder, H. W., Rosenbaum, H., Martinson, A., and Callahan, E. (1998) Invoked on the Web. Journal of the American Society for Information Science, 49 (14): 1319-1328. Eemeren, F. and Grootendorst, R. (1994) Studies in Pragma-Dialectics.Amsterdam, The Netherlands: Sic Sat: International Centre for the Study of Argumentation. Van Eemeren, F. H., and Grootendorst, R. (1992) Argumentation, Communication, and Fallacies: A Pragma-Dialectical Perspective.USA, NJ: Lawrence Erlbaum Associates. Van Eemeren, F. H., and Houtlosser, P. (2001) Rhetoric in pragma-dialectics. Argumentation, interpretation and rhetoric. Issue 1. Fry, J. (2006) Scholarly research and information practices: a domain analytic approach. Information Processing and Management. 42 (1): 299-316. Gibbons, M., Limoges, C., Nowotny, H., 'et al'. (1994) The new production of knowledge: The dynamics of science and research in contemporary societies.London: Sage. Harries, G., Wilkinson, D., Price, E., Fairclough, R. & Thelwall, M. (2004). Hyperlinks as a data source for science mapping. Journal of Information Science, 30(5): 436-447. Haythornthwaite, C. and Wellman, B. (1998) Work, Friendship and Media Use for Information Exchange in a Networked Organization.Journal of the American Society for Information Science. 49 (12): 1101-14. Heilbron, J. (2004) A regime of disciplines: Toward a historical sociology of disciplinary knowledge, in: C., Camic and H., Joas (eds) The dialogical turn: New roles for sociology in the post-disciplinary age. United States: Rowman & Littlefield Pub Inc, 2004, 23-42. Heimeriks, G., Horlesberger, Van den Besselaar, P. (2003) Mapping communication and collaboration in heterogeneous research networks. Scientometrics, 58 (2): 391-413. Hine, C. (2000) Virtual ethnography. London: Sage. Kling, R. and McKim, G. (2000). Not just a matter of time: Field differences and the shaping of electronic media in supporting scientific communication. Journal of the American Society for Information Science. 51(14): 1306-1320. Kochen, M. (1983) Mathematical model for the growth of two specialties. Science of Science, 3 (11): 199-217. Lawler, J. D., and Dry, H. (eds) (1998). Using computers in Linguistics: a practical guide.London, Routledge. Lenoir, T. (1997)Instituting science: The cultural production of scientific disciplines. Stanford, CA: StanfordUniversity Press. Marcus G.E. (1995) Ethnography in/of the world system: the emergence of multi-sited ethnography. Annual Review of Anthropology, 24: 95-117. Merton, R. K. (1968) The Matthew Effect in Science. Science, 159 (3810): 56-63. Moukdad, H. (2002) Language-based retrieval of web documents: an analysis of Arabic-recognition capabilities of two major search engines. Proceedings of the 65th Amercian Society for Information Science and Technology Annual Meeting. November 18-21, 2002, Philadelphia, PA.Medford: Information Today, 39: 551. Price, D. J. (1963) Little science, big science. New York: ColumbiaUniversity Press. Rousseau, R. (1997) Sitations: an exploratory study. Cybermetrics. 1 (1). Rousseau, R. (1999) Daily time series of common single word searches in Alta Vista and NorthernLight. Cybermetrics, 2/3. Salter, L. & Hearn, L. (1996) Outside the lines: issues in interdisciplinary research. Kingston, Montreal: McGill-Queen's University Press. Sampson, G. (2003) Are we nearly there yet, Mum? Corpus Linguistics 2003 conference, Lancaster, March 2003. Schneider, S. M. and K. A. Foot (2002). Online Structure for Political Action: Exploring Presidential Web Sites from the 2000 American Election. Javnost (The Public). 9(2): 43-60. Swales, J. M. (1998) Other floors, other voices: A textography of a small university building. London: Lawrence Erlbaum Associates. Thelwall, M. (2005, to appear). Interpreting social science link analysis research: A theoretical framework. Journal of the American Society for Information Science and Technology. Thelwall, M. & Harries, G. (2004). Can personal web pages that link to universities yield information about the wider dissemination of research? Journal of Information Science, 30(3), 243-256. Vaughan, L. & Thelwall, M. (2004). Search engine coverage bias: evidence and possible causes. Information Processing and Management, 40(4): 693-707. Wenger, E. (1998) Communities of practice: learning, meaning and identity. Cambridge, U.K.: CambridgeUniversity Press. Whitley, R. (1984) The intellectual and social organization of the sciences. Oxford: Clarendon Press. Wouters, P. and de Vries, R. (2004) Formally citing the web. Journal of the American Society for Information Science and Technology. 55(14): 1250-1260. Received 15/Dec/2005
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright
information | | | Sitemap |