distributional hypothesis paper

Help | Advanced Search

Computer Science > Computation and Language

Title: distributional semantics, holism, and the instability of meaning.

Abstract: Current language models are built on the so-called distributional semantic approach to linguistic meaning that has the distributional hypothesis at its core. The distributional hypothesis involves a holistic conception of word meaning: the meaning of a word depends upon its relations to other words in the model. A standard objection to meaning holism is the charge of instability: any change in the meaning properties of a linguistic system (a human speaker, for example) would lead to many changes or possibly a complete change in the entire system. When the systems in question are trying to communicate with each other, it has been argued that instability of this kind makes communication impossible (Fodor and Lepore 1992, 1996, 1999). In this article, we examine whether the instability objection poses a problem for distributional models of meaning. First, we distinguish between distinct forms of instability that these models could exhibit, and we argue that only one such form is relevant for understanding the relation between instability and communication: what we call differential instability. Differential instability is variation in the relative distances between points in a space, rather than variation in the absolute position of those points. We distinguish differential and absolute instability by constructing two of our own models, a toy model constructed from the text of two novels, and a more sophisticated model constructed using the Word2vec algorithm from a combination of Wikipedia and SEP articles. We demonstrate the two forms of instability by showing how these models change as the corpora they are constructed from increase in size.

Submission history

Access paper:.

Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Distributional Theories of Meaning: Experimental Philosophy of Language

First Online: 17 June 2023

Cite this chapter

Jumbly Grindrod 29

Part of the book series: Logic, Argumentation & Reasoning ((LARI,volume 33))

232 Accesses

8 Altmetric

Distributional semantics is an area of corpus linguistics and computational linguistics that seeks to model the meanings of words by producing a semantic space that captures the distributional properties of those words within a corpus. In this paper, I provide an overview of distributional semantic models, including a broad sketch of how such models are constructed. I then outline the reasons for and against the claim that distributional semantic models can serve as a theory of meaning, paying special attention to those within the field who have defended this claim. Finally, I conclude by arguing that despite the fact that such models are holistic, they nevertheless avoid the objections raised against holistic theories of meaning, particularly from Fodor & Lepore ( 1992 ) (Holism: a shopper’s guide. Blackwell, 1992) and Fodor & Lepore ( 1999 ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime
Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Distributional Semantics: A Montagovian View

Meaning, Semantics and Semiotics

An Introduction: Foundational Issues in Semantics and Pragmatics

See also: Firth ( 1957 , p. 11) “You shall know a word by the company it keeps!”.

It seems likely that there is a difference in register here, and that this difference will affect the ways in which these expressions are distributed across corpora. See: § 5.4 for discussion of this issue with regard to distributional semantic models of meaning.

Lenci ( 2008 ) distinguishes between a weak form of the distributional hypothesis – equivalent to DH 1 – and a strong form. However, Lenci’s strong form is a cognitive hypothesis that distributional structures serve as part of the explanation of how expressions within a language are cognized. My focus here is on whether distributional semantics can serve as a theory of meaning and it is at least possible that a theory of meaning need not capture such cognitive facts. For further discussion of this issue, see § 5.3 .

Rather than looking at the most frequent terms, collocation analyses will often use association scores to pick collocates. This is reflective of the fact that often the most frequent collocates are simply the more frequent terms in a language, and so it is better instead to look at the collocates that bear the strongest statistical association, such as the highest Mutual Information score.

More complex models may capture the meanings of certain terms using some other mathematical object such as a tensor (see: Baroni et al., 2014a ). We will ignore this complication for now.

Readers seeking greater technical detail may want to turn to: (Erk, 2012 ; Clark, 2015 ; Kiela & Clark, 2014 ; Lenci, 2018 ; Boleda, 2020 ).

The cosine of an angle in a right-angled triangle is calculated by dividing the adjacent side with the hypotenuse. There are alternative measures of similarity, such as Euclidean distance or the dot product, which may be called for in particular investigations. One advantage of cosine similarity with regard to investigating meaning is that it does not take into account the magnitude of the vector and so is not affected by the overall frequency of the terms in the way that Euclidean distance or dot product is. This accords with the idea that two terms could be very similar in meaning even if one is used much more frequently than the other.

For an overview of various weighting functions, see Kiela and Clark ( 2014 ).

Thanks to an anonymous reviewer for emphasizing this point.

Landauer and Dumais ( 1997 ) and Lenci ( 2018 ) both emphasise the importance of dimensionality reduction in the process of producing a model that captures meaning. Both suggest that dimensionality reduction serves as an abstraction mechanism that picks up on latent patterns in the distributional data that would not be detected by a model operating on raw frequency statistics. In this respect, dimensionality reduction may be an important step that brings greater benefit than just computational efficiency.

There may still be indirect appeal to intuitions. For instance, if our model is constructed according to whether it can predict human judgments of semantic similarity, then clearly meaning intuitions are playing a role in the evaluation of the model. But even if we acknowledge this, there is still a sense in which the role of intuitions is being minimized. We may allow that meaning intuitions are playing a role at the point of model evaluation, but once we have a model that passes the evaluation, and so (ideally) works, this will be able to inform us about the meanings of terms not included in the evaluation task.

Thanks to Emma Borg for emphasising this point.

Note that the kind of polysemy considered here is what might be termed compositional polysemy i.e. the variation in meaning that an expression displays when combined in various larger expressions. It could be argued that this phenomenon presents no particular problem for the formal semantic tradition provided that it is acknowledged first that e.g. “cutting” may be associated with more than one sense and second that which sense it contributes depends partly on the expression it is combined with. There would be no principled barrier to a formal semantic model capturing these facts, and so there is no thorn in the side of the semantic tradition here (Fodor & Lepore, 1998 , p. 284; Borg, 2012 , pp. 188–189). Even if this is right, the benefit of DSMs should still be noted i.e. that they seem to provide a way of modelling how the specific meanings of complex expressions can arise from their parts, rather than just providing a general model of how expressions of particular semantic types combine with one another.

Westera and Boleda’s account certainly warrants greater discussion than I will give it here. In particular, their proposal has clear points of similarity with other views (e.g., radical contextualist views, relevance theoretic views, Pietroski’s ( 2018 ) internalist account of meanings as procedures) that claim that a theory of meaning should not capture worldly phenomena such as truth and reference. Westera and Boleda arguably go a step further in claiming that a theory of meaning should not even capture entailment relations.

See also Lenci ( 2008 ).

McNally and Boleda ( 2017 ) propose a novel combination of discourse representation theory and distributional semantics in order to capture the conceptual composition of modified noun phrases (e.g. that “red” modifies “pen” in a different way to the manner in which it modifies “apple”), and particularly the way in which the conceptual composition is sometimes affected by features of the object referred to (what they call “referentially afforded interpretations”). In doing so, they develop an interesting proposal on how distributional representations can be viewed as encoding conceptual information for both simple and complex expressions.

See Bender and Koller ( 2020 ) for a complaint of this kind applied specifically to the idea that language models capture meaning or understanding.

An alternative approach to capturing compositionality in a DSM has been to use recurring neural networks, where the vectors for individual words are used as input to a neural network that then produces a vector for the combination of those words (Socher et al., 2012 ).

See, e.g., Firth ( 1957 ); Lenci ( 2008 , 2018 ); Sahlgren ( 2008 ); Erk ( 2012 ); Westera and Boleda ( 2019 ).

It is tempting to think that DSMs are only holistic according to the above definition if the corpus dictionary contains all other expressions contained within the corpus – and so the meaning of any given expression would be represented by its co-occurrence with all other expressions within the corpus. This needn’t be the case, however. The corpus dictionary just plays the role of capturing the distribution of a given expression within a corpus. Even if one had a limited corpus dictionary, it would still be the case that the distribution of one expression would be dependent upon the distribution of all other expressions including those not included within the corpus dictionary.

One objection against holistic theories of meaning is that holistic meanings are not compositional (Fodor & Lepore 1992 , p. 175 ff.). As we saw in Sect. 5.2 , whether DSMs can capture the compositionality of meaning is currently treated as an open research question within distributional semantics, and so I will not consider that objection any further here.

Fodor and Lepore emphasise other problems that arise from such instability. For instance, it would seem that an individual would never be able to change their mind regarding the truth of a sentence, as any change in mind would be a change of beliefs, and so what was meant by the sentence would then change as well. So strictly, rather than going from believing p to believing ¬p, the individual would then be considering some proposition other than p. They also emphasise that inferentialism understood as a theory of mental content will not be able to provide intentional explanations that generalise over propositional attitudes, as the possession of propositional attitudes will be dependent upon the particular beliefs of an individual. However, these problems are quite particular to a form of meaning holism that depends upon the complete set of beliefs for an individual, and so they will not be of concern here.

Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and distributional data to learn semantic representations. Psychological Review, 116 (3), 463–498.

Article Google Scholar

Asher, N. (2011). Lexical meaning in context: A web of words . Cambridge University Press.

Book Google Scholar

Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36 (4), 673–721.

Baroni, M., Bernardi, R., & Zamparelli, R. (2014a). Frege in space: A program for compositional distributional semantics. In Linguistic issues in language technology, volume 9, 2014 – Perspectives on semantic representations for textual inference . CSLI Publications.

Google Scholar

Baroni, M., Dinu, G., & Kruszewski, G. (2014b). Don’t count, predict!: A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 238–247). Association for Computational Linguistics.

Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics , pp. 5185–5198.

Boleda, G. (2020). Distributional semantics and linguistic theory. Annual Review of Linguistics, 6 (1), 213–234.

Boleda, G., & Herbelot, A. (2016). Formal distributional semantics: Introduction to the special issue. Computational Linguistics, 42 (4), 619–635.

Borg, E. (2012). Pursuing meaning . Oxford University Press.

Brandom, R. (1994). Making it explicit . Harvard University Press.

Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39 (3), 510–526.

Clark, S. (2015). Vector space models of lexical meaning. In S. Lappin & C. Fox (Eds.), The handbook of contemporary semantic theory (pp. 439–522). Blackwell.

Davidson, D. (1967). Truth and meaning. Synthese, 17 (3), 304–323.

Davidson, D. (1973). Radical interpretation. Dialectica, 27 (3/4), 313–328.

Devitt, M. (2006). Intuitions in linguistics. British Journal for the Philosophy of Science, 57 (3), 481–513.

Dresner, E. (2012). Meaning holism. Philosophy Compass, 7 (9), 611–619.

Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6 (10), 635–653.

Erk, K., & Padó, S. (2008). A structured vector space model for word meaning in context. In Proceedings of the 2008 conference on empirical methods in natural language processing (pp. 897–906). Association for Computational Linguistics.

Firth, J. R. (1957). A synopsis of linguistic theory. In Studies in linguistic analysis (pp. 1–32). Blackwell.

Fodor, J. A., & Lepore, E. (1992). Holism: A shopper’s guide . Blackwell.

Fodor, J. A., & Lepore, E. (1996). Reply to Churchland. In R. N. McCauley (Ed.), The Churchlands and their critics (pp. 159–162). Blackwell.

Fodor, J. A., & Lepore, E. (1998). The emptiness of the lexicon: Reflections on James Pustejovsky’s “The generative lexicon”. Linguistic Inquiry, 29 (2), 269–288.

Fodor, J. A., & Lepore, E. (1999). All at sea in semantic space: Churchland on meaning similarity. The Journal of Philosophy, 96 (8), 381–403.

Frege, G. (1997). Begriffsschrift: A formula language of pure thought modelled on that of arithmetic. In M. Beaney (Ed.), The Frege reader (pp. 47–78). Oxford Beaney.

Gries, S. T. (2016). Quantitative corpus linguistics with R (2nd ed.). Routledge.

Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42 (1–3), 335–346.

Harris, Z. S. (1954). Distributional structure. Word, 10 (2–3), 146–162.

Heim, I., & Kratzer, A. (1998). Semantics in generative grammar . Blackwell.

Jackman, H. (1999). Moderate holism and the instability thesis. American Philosophical Quarterly, 36 (4), 361–369.

Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming. Journal of Memory and Language, 55 (4), 534–552.

Kiela, D., & Clark, S. (2014). A systematic study of semantic vector space model parameters. In Proceedings of the 2nd workshop on continuous vector space models and their compositionality , pp. 21–30.

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104 (2), 211–240.

Larson, R., & Segal, G. (1995). Knowledge of meaning: Introduction to semantic theory . MIT Press.

Lasnik, H., & Lidz, J. L. (2016). The argument from the poverty of the stimulus. In I. Roberts (Ed.), The Oxford handbook of universal grammar (pp. 221–248). Oxford University Press.

Lenci, A. (2008). Distributional semantics in linguistic and cognitive research. Italian Journal of Linguistics, 20 (1), 1–31.

Lenci, A. (2018). Distributional models of word meaning. Annual Review of Linguistics, 4 (1), 151–171.

Lewis, D. (1970). General semantics. Synthese, 22 (1/2), 18–67.

Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28 (2), 203–208.

Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92 , 57–78.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122 (3), 485–515.

Marelli, M., Gagné, C. L., & Spalding, T. L. (2017). Compounding as abstract operation in semantic space: Investigating relational effects through a large-scale, data-driven computational model. Cognition, 166 , 207–224.

Maynes, J., & Gross, S. (2013). Linguistic intuitions. Philosophy Compass, 8 (8), 714–730.

McEnery, T., & Wilson, A. (1996). Corpus linguistics: An introduction . Edinburgh University Press.

McNally, L., & Boleda, G. (2017). Conceptual versus referential affordance in concept composition. In J. Hampton & Y. Winter (Eds.), Compositional and concepts in linguistics and psychology (pp. 245–268). Springer.

Chapter Google Scholar

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space . ICLR Workshop.

Mikolov, T., Wen-tau, Y., & Zweig, G. (2013b). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746–751). Association for Computational Linguistics.

Mitchell, J., & Lapata, M. (2008). Vector-based models of semantic composition. In Proceedings of ACL-08: HLT (pp. 236–244). Association for Computational Linguistics.

O’Keeffe, A., & McCarthy, M. (2010). Routledge handbook of corpus linguistics (1st ed.). Routledge.

Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33 (2), 161–199.

Padó, S., Padó, U., & Erk, K. (2007). Flexible, corpus-based modelling of human plausibility judgements. In Proceedings of the 2007 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 400–409). Association for Computational Linguistics.

Pagin, P. (2008). Meaning holism. In E. Lepore & B. C. Smith (Eds.), The Oxford handbook of philosophy of language (pp. 213–232). Oxford University Press.

Pietroski, P. M. (2018). Conjoining meanings: Semantics without truth values . Oxford University Press.

Quine, W. V. O. (1960). Word & object . MIT Press.

Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Linguistics, 20 (1), 33–53.

Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24 (1), 97–123.

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3 (3), 417–424.

Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1201–1211). Association for Computational Linguistics.

Travis, C. (1997). Pragmatics. In B. Hale & C. Wright (Eds.), A companion to the philosophy of language (pp. 87–107). Blackwell.

Westera, M., & Boleda, G. (2019). Don’t blame distributional semantics if it can’t do entailment. In Proceedings of the 13th international conference on computational semantics – long papers , pp. 120–133.

Download references

Author information

Authors and affiliations.

Department of Philosophy, Edith Morley, Whiteknights, University of Reading, Reading, UK

Jumbly Grindrod

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jumbly Grindrod .

Editor information

Editors and affiliations.

Departamento de Lógica y Filosofía Teórica, Universidad Complutense de Madrid, Madrid, Spain

David Bordonaba-Plou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Grindrod, J. (2023). Distributional Theories of Meaning: Experimental Philosophy of Language. In: Bordonaba-Plou, D. (eds) Experimental Philosophy of Language: Perspectives, Methods, and Prospects. Logic, Argumentation & Reasoning, vol 33. Springer, Cham. https://doi.org/10.1007/978-3-031-28908-8_5

Download citation

DOI : https://doi.org/10.1007/978-3-031-28908-8_5

Published : 17 June 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-28907-1

Online ISBN : 978-3-031-28908-8

eBook Packages : Religion and Philosophy Philosophy and Religion (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

Distribution is not enough: going Firther

Andy Lücking , Robin Cooper , Staffan Larsson , Jonathan Ginzburg

Export citation

Preformatted

Markdown (Informal)

[Distribution is not enough: going Firther](https://aclanthology.org/W19-1101) (Lücking et al., 2019)

Distribution is not enough: going Firther (Lücking et al., 2019)
Andy Lücking, Robin Cooper, Staffan Larsson, and Jonathan Ginzburg. 2019. Distribution is not enough: going Firther . In Proceedings of the Sixth Workshop on Natural Language and Computer Science , pages 1–10, Gothenburg, Sweden. Association for Computational Linguistics.

IMAGES

Using the Distributional Hypothesis to Derive Cooccurrence Scores
(PDF) Refining the Distributional Inclusion Hypothesis for Unsupervised
Beyond Distributional Hypothesis: Let Language Models Learn Meaning
Assessing the Limits of the Distributional Hypothesis in Semantic
Solved a Q1. a. [5 pts] What is Distributional Hypothesis in
Illustration of the Distributional Hypothesis and Language Model

VIDEO

[21MATCS41] Model Question Paper 2 (Q.7c)
Statistics
Std 12 Statistics
Concept of Hypothesis in Hindi || Research Hypothesis || #ugcnetphysicaleducation #ntaugcnet
NEGATIVE RESEARCH HYPOTHESIS STATEMENTS l 3 EXAMPLES l RESEARCH PAPER WRITING GUIDE l THESIS TIPS
Research Hypothesis || Paper 1 NTA NET JRF #shorts #research

COMMENTS

PDF The distributional hypothesis
∗This paper is based on several chapters of my PhD dissertation (Sahlgren, 2006). 1. what they mean by meaning. For the non-distributionalist, on the other hand, ... The distributional hypothesis is often motivated by referring to the works of Zellig Harris, who advocated a distributional methodology for linguistics. In
Distributional Semantics and Linguistic Theory
When citing this paper, please use the following: Boleda, G. 2020. Distributional Semantics and Linguistic Theory. ... Lenci (2018). Distributional semantics is based on the Distributional Hypothesis, which ... 2016, p. 623). Distributional semantics reverse-engineers the process, and induces semantic representations from contexts of use.
[PDF] The Distributional Hypothesis
There is a correlation between distributional similarity and meaning similarity, which allows us to utilize the former in order to estimate the latter, and one can pose two very basic questions concerning the distributional hypothesis: what kind of distributional properties the authors should look for, and what — if any — the differences are between different kinds of Distributional ...
The distributional hypothesis
The paper argues that - under the assumptions made by the distributional paradigm - the distributional representations do constitute full-blown accounts of linguistic meaning. Discover the world's ...
Distributional Semantics, Holism, and the Instability of Meaning
Current language models are built on the so-called distributional semantic approach to linguistic meaning that has the distributional hypothesis at its core. The distributional hypothesis involves a holistic conception of word meaning: the meaning of a word depends upon its relations to other words in the model. A standard objection to meaning holism is the charge of instability: any change in ...
PDF The distributional hypothesis
The distributional hypothesis * Magnus Sahlgren ... paper argues that - under the assumptions made by the distributional paradigm - the distributional representations do constitute full-blown accounts of linguistic meaning. Bibliographical references Black Ezra 1988. An experiment in computational discrimination of English word senses.
PDF Chapter 5 Distributional Theories of Meaning: Experimental ...
In this paper, I will focus on . distributional semantics: a mathematical approach ... the distributional hypothesis as outlined in the quotations from Lenci and Harris is merely a claim about correlation between meaning and distribution. Indeed, it is worth distinguishing between two forms of the distributional hypothesis: ...
The Logic of Language: from the Distributional to the Structuralist
From the Distributional to the Structuralist Hypothesis through Types and Interaction Juan Luis Gastaldi Luc Pellissier Abstract The recent success of new AI techniques in natural language process-ing rely heavily on the so-called distributional hypothesis. We first show that the latter can be understood as a simplified version of the classic
Testing the Distributioanl Hypothesis: The influence of Context on
The general principle under exploration the Distributional Hypothesis, which combines the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, is called. Testing the Distributional Hypothesis: The Influence of Context on Judgements of Semantic Similarity Scott McDonald (scottm@cogsci.ed.ac.uk) Michael Ramscar (michael@cogsci ...
Distribution is not enough: going Firther
In practice, use is identified with occurrence in text corpora, though there are some efforts to use corpora containing multi-modal information. In this paper we argue that the distributional hypothesis is intrinsically misguided as a self-supporting basis for semantics, as Firth was entirely aware.