Introduction
The use of Artificial intelligence (AI) technology is growing every day in society. The reason for this is that in the last decades the development and improvement of these types of systems have grown leading to several types of applications in different domains. Today AI can be understood as an umbrella term, which covers a lot of different types of technologies, all of which share the capacity of information processing. Examples of AI systems today are machine learning, expert systems, robotics, deep learning networks, and language processing systems. Deep learning networks (DL) are one of the most promising applications of AI, and this essay will focus on the analysis and discussion of this type of application. The reason why DLs are so promising is due to their capacity of elaborating big amounts of data and to recognize patterns inside them.
The results that DL systems achieve are so promising that in certain domains they overcome the capacity of humans. These results have created discussions and comparisons between DL systems and humans in terms of capacity and intelligence, leading some experts to say that machines already overcome human capacities, or that they are approaching that threshold. Humans and DL systems share similarities in the way information is processed. Both DL systems and humans operate through black-box architecture in which the transformation from input to output is not observable. Both entities resemble analogies also in the information processing area, in fact, DL systems and Humans (through the neurons) base information processing on binary computation(Rieke, Warland, de Ruyter van Steveninck, and Bialek, 1999). From the above is clear that some analogies hold up, however, it would be an ontological mistake to understand equally the epistemological result of the two entities.
This essay aims to break down the analogy. This is done by focusing on the process of representation modeling. The argument is that even if the final representation model of an object is similar between DL and humans, the paths followed by the two during the phase of generalizing information are different. Therefore, the two representations have different epistemological meanings. This is an important aspect to be acknowledged when evaluating the outcomes of DL systems and humans. From this, the research question follows: How do DL systems and humans epistemologically defer when it comes to the representation modeling of objects?
To answer the question, first, I explain the choice of focusing on representation as the case of this study. Then a focus on the core argument of the paper is made. I argue that DL systems follow a descriptive generalization path by focusing on physical features, on the other side, humans follow an abstract generalization that takes into consideration the use function of the object. The essay is based on argumentative structure and examples.
Why representation modeling?
Representation implies generalization, because of the extracting process of information. I define representation modeling as the description of ontological entities based on generalization, where generalization stands for inductive processes that create inferences based on a series of specific objects.
In DL systems generalization is obtained by applying training data in which categories are labeled (Buckner, 2019). Developers apply explicit regularization techniques to increase generalization. A common technique for regularization is to make small changes to the training data so it forces the DL system to reinforce the broader category (Buckner, 2019). Contrary, as described by Dehaene (2021), humans generalize thanks to the ability to extract abstract properties by using feedback and experience. Abstract properties are those that exist outside the physical features of an object. For the scope of the essay, I argue in favor of a particular abstract property, namely the “use” function.
Therefore, it is argued that even though the final result of the representation model might be similar between humans and machines, the paths followed by the two to construct the representation are different, consequentially, also the epistemological meaning behind them differ. To make examples, when needed, I will refer to the representation model of a chair. Chairs are a popular example in the literature (Buckner, 2018), this is because the category of the chair can be applied to several objects with different physical features, thus drawing a representation of a chair that has a high level of accuracy with the chair as a physical object, not only descriptive features must be taken into account but also abstract features.
The process of representation might lead to an epistemological and conceptual mistake. It is plausible to say that it is easier to represent any property of an object after representing the object itself (Buckner, 2018). This view seems to have a universal validity without taking into account the ontological difference of the subjects that are making the representation. This implies a generalization of the object, from which then is possible to represent its properties. In my understanding, this is possible based on a descriptive generalization, in which representations are built based on an average of shared patterns between the input data. This view is compatible with the empirical findings of DL systems, as shown in Dosovitskiy, Springenberg, and Brox (2015) where DL systems based on this generalization process can generate representation of chairs in different and novel examples.
However, this does not prove that the generalization was made by including abstract property, and it also does not prove that humans operate in the same way. In my view, it is intuitively to say that when it comes to object representation modeling humans operate oppositely. The representation process starts with the property of an object and then goes to the representation of the object itself. Therefore, in a chair, the starting point of its representation is to consider the property (i.e. use) of sitting, and from this then derive the object itself. It would not have sense to first represent the physical features of an object and then find its properties.
In the next two sections, I will argue in favor of my view. First, I will present arguments that show how DL systems representation modeling is based on a descriptive process. Second, I will present how humans base representation on abstraction, by arguing in favor of generalizing from the use function.
DL systems and representation: a descriptive view
This section is used to illustrate my argument that DL systems build representation models of objects based on description. I will show this by using two examples that take place inside DL systems, respectively adversarial examples, and heatmaps.
Adversarial example
It is a matter of discussion the fact that DL systems are capable of extra abstract information from a data set, and use that information to build a representation model of a given object. An important argument that goes against the capacity of abstraction of DL systems is the sensitivity of these systems toward the so-called adversarial examples (Jo and Bengio, 2017). Adversarial examples are specialized inputs created to test the artificial neural network on which DL systems are based. The idea is to slightly change an input inside an image and see if the DL system fails or not to identify the represented image after the change of the input. The argument is that if DL systems can learn and capture abstract properties they would not fail in recognize them. On the other hand, if they fail to recognize it means that the DL system was not able to draw generalizations based on abstract properties and the system analyzed the image based on descriptive features. A good example of such a type of adversarial example is the one described by Campolo and Crawford (2020) when the GoogLeNet system failed in recognizing the image of a panda by classifying it as a gibbon by adding an imperceptible small vector to the original image of a panda. Therefore, the system has generalized descriptively rather than abstractedly.
Heatmaps
A heatmap is a two-dimensional visual representation of data that uses colors to representing different values. DL systems today use heatmaps to detect and represent objects (Amherd and Rodriguez, 2021). Heatmaps are based on surface statistical regularities, meaning that DL systems’ representations share the same type of superficial properties (Jo and Bengio, 2017).To accept this view, there must be a strong statistical relationship between image statistics and visual understanding. In support of this, Torralba and Oliva’s (2003) work has shown how simple image statistics can be used to predict the absence or presence of an object in a given scene before exploring the image by the system.
Moreover, Jo and Bengio (2017) have shown that even though the superficial nature of the data, DL systems are still able to generalize representations based on images’ superficial learning and statistical properties. The authors argued that the system should possess three properties: first, the preservation of object recognizability from a human perspective. This means that the system can recognize an object (i.e. a chair) based on the superficial images’ statistical properties of the data set. Second, qualitatively different surface regularities. This property means that, in combination with the first property, there is a sharing of average images’ statistical properties through the different datasets. Thus, even though some surface regularities change between datasets, the overall is still valid to build representations. Third, the existence of a non-trivial generalization gap. Generally speaking, it means that the neural network can descriptively find an interesting point in the data set from which can generalize and build the representation model. Therefore, heatmaps are based on descriptive representation, a good example of this heatmap process is described by Bach, Et al. (2016).
Humans and representation: an abstract view
This section attempts to explain how the use function applies to representation models of the object. More generally, representation modeling is part of the categorization process through which humans categorize the world. In this essay, I argued that humans generalize and create representations of objects based on the “use” function of the objects. By use function, I mean the use that an object has. Moreover, use implies interaction between the subject and the world.
The use function is defined as a form of generalization that creates representation models of objects based on the assumption of interacting with the world through a physical body. The interaction leads to an interpretation by the subject, that is reinforced or weakened from experience. The interpretation is made through a semantic and statistical analysis learned through experience.
Therefore the use function representation model of an object is made by a subject that interacts with the world using a body, through a semantic and statical process learned through experience.
Making an example, sitting is the use function of a chair. This does not mean that other categorizations are not applied to the representations as physical regularities (i.e. having four legs), shape (different from an armchair), or that a chair as an object is just used for sitting. But the argument is that the use function is an important component of the representation process, and it helps in the abstract generalizations that humans apply during the interaction with the external world.
The idea is contrary to the process of seeing and recognizing the object, in which the brain focuses first on the physical features of the object, and then it interprets the abstract properties. When it comes to representation the process is the opposite. The abstract property and the use function are projected on the representation, and then the physical aspect find realization in the shape and form. To explain my perspective I will use a semantic view of reality in relation to the functioning of the brain.
For this view the assumption of the body as both a sensory acquisition of data and an active means of interpretation is crucial. This is because the body is understood as the point of reference from which the generalization process that leads to the representation starts.
At first, I will apply the concept of grammar to describe the external world as a text. Second, I will use a Bayesian framework to explain the statistical property of the brain. Semantics and statistics modeling is essential in my view of representation modeling as a use function. The reason is that the brain by reading the world attribute a probability function to the possibilities of what it sees based on experience. Therefore, an object fits the representation model of a chair based on the statistical probability of the use function (i.e. sitting) according to the semantic reading of the world by the subject.
Semantic
Brain-based semantic theories are becoming a popular view of the brain revealing important insight both on the empirical and theoretical side. In the most general sense, semantics is the study of reference and truth based on meaning. Therefore, semantics is interested in analyzing the conceptual and abstract level of information, making it suitable for the aim of this essay. Because semantics is interested in the analysis of meaning, it assumes that there are some based components that create meaning. Translating this in the operational aspects of the brain, (J.R. Binder, Et al., 2016) argues that the representations of concepts inside the brain are embodied in the perception, action, and other neural systems through which representations are created.
A semantic view of representation models, in opposition to more classical category theories, fits better the current view and model functions of the brain. This is because semantic theories required less degree of specialized neuron areas and a higher dynamic degree based on interpretability (J.R. Binder, Et al., 2016).
Grammar
The idea of grammar as the structure of reality can be traced back to the philosophical work of Wittgenstein. In his work, he expanded the idea of grammar, by taking the concept that referred to the syntactic rules and application of them inside a language system, and expanded on it semantically by using it as a concept to be applied for describing reality. Following Wittgenstein, grammar tells what kind of object anything is (Ring, 2018). Thus, the Wittgensteinian idea of grammar was to build a system of reference by which to able to interpret the world (Biletzki and Matar, 2021). Building a system of reference makes the world readable, thus possible meaning attribution due to interpretation.
The reference is context-dependent, as in Wittgenstein’s words “being sensible to grammar means be sensitive to the world” (Wittgenstein, 2001), enabling to shift the semantic understating to the context-dependency. Applying the use function, it makes possible to employ the chair in different contexts for different uses, but the main reference is the function of sitting. Therefore, in the representation model of an object, the grammar structure attributes the reference to which the use functions refer. Meaning that the chair is the reference for sitting. This type of system supports a bayesian framework.
Bayesian framework
The Bayesian theory is a statistical probability framework that describes the probability of an event, based on prior knowledge of conditions that might be related to that event (Joyce, 2003).
The application of a Bayesian framework to the brain is an idea that is finding empirical confirmation (Seth, 2013) (Gazzaniga, 2016) (Dehaene, 2021). Moreover, by applying a bayesian framework emerges that the brain has an endogenous activity that evolves under the influence of sensory organs (Dehaene, 2021).
Therefore, an organism to properly function should develop predictive models between the brain activity and its own body (Seth, 2015). But these predictive models of the brain arguably apply also to the world under a semantic interpretation and grammatical structure. Thus, it is possible to apply this view to the representation model based on the probability of the use function. This is made by generalizing from abstract properties.
Chalmers (1999) inside his works develops a descriptive theorem of a bayesian framework. The theorem describes that the conditional probabilities for propositions depend on the evidence of those propositions. This means that probabilities can change based on new evidence.
Following, a descriptive theoretical formula is built to illustrate the application of the Bayesian framework to the use function representation model that takes place in humans. The formula is developed based on Chalmers’s (1999) model.

P(r) stands for the probability of validity of representation. P(u/e)t’ stands for the probability of use “u” in light of the experience “e” of the subject in time t’. P(u/e)t stands for the probability of use “u” in light of the experience “e” of the subject in time t. P(e/u) stands for the probability that experience “e” is ascribed in the use of the assumption of “u”. Translating, this means that the representation model of an object is given by the probability of understanding the use of an object, the understating is learned by experience.
The assumption of the use function must be confirmed every time “t”, therefore the validity of the representations is verified through a feedback process from t’ to t. If the feedback process confirms the representation, the probability of the accuracy of the model increases reinforces by experience. Therefore, the model consolidates itself as a representation based on the use function. Vice-versa, if the model is not confirmed through feedback based on the use function, the experience is incompatible with the real-world scenario, thus the representation is not valid.
This view is compatible with the semantic view of reality in which the meaning of a reference is embedded inside the grammatical structure of the world. And it is the probability of confirmation of the use of that meaning that confirms the validity of a representation. Therefore, a chair is a chair if it has the use function of supporting the action of sitting. The use function is confirmed by the probability. The use function is embedded inside the grammatical structure, which supports the probability given by use and experience. And this ends up in a semantic view of the world as constructed with meaning that is given by the use of an object.
Discussion and conclusion
In this essay, I argued that important differences are present when DL systems and humans created representations of objects. I argued that DL systems create representation based on the description, on other hand, humans create representation based on abstraction. The suggestion is that the biggest difference is set on how the generalization occurs. DL systems through the descriptive approach are more capable of pattern recognition, thus they generalize by drawing on physical patterns.
This type of process can find hidden patterns inside data sets that humans are less capable of, on the other side it makes DL systems more exposed to errors as adversarial examples and heatmaps show. In opposition, humans are less sensitive to the physical features of an object, however, humans draw representation by experience and abstract properties. In this sense, the bayesian theory is a reinforcement process based on feedback that helps to inductively build the representation starting from the use function. To do so, it seems that the first thing needed is the semantic capability of attribute meaning. However, the capacity of relating meaning under a use function perspective seems to be directly related to the possession of a body. Humans by using the body attribute the use function “sitting” to the chair, therefore, it can be generalized that a chair is an object with the function of sitting.
On the other side, DL systems cannot because do not possess a body. Thus, it cannot generalize by the use function of sitting because it is outside of their possibilities. The only way that it has to represent is by building on their possibility, and so by patterns and physical analysis. These types of differences suggest that DL systems are more capable of reproducing representation making them dependent on the physical elements, on the other side humans are less capable of reproducing, but because less dependent on the physical features, humans can apply abstract properties to new representations.
References
- Amherd, F. and Rodriguez, E. (2021) Heatmap-based object detection and tracking with a fully convolutional neural network, arXiv.org. Available at: https://arxiv.org/abs/2101.03541 (Accessed: November 10, 2022).
- Bach, S. et al. (2016) “Controlling explanatory heatmap resolution and semantics via decomposition depth,” 2016 IEEE International Conference on Image Processing (ICIP) [Preprint]. Available at: https://doi.org/10.1109/icip.2016.7532763.
- Biletzki, A. and Matar, A. (2021) Ludwig Wittgenstein, Stanford Encyclopedia of Philosophy. Stanford University. Available at: https://plato.stanford.edu/entries/wittgenstein/#GramFormLife (Accessed: November 10, 2022).
- Botvinick, M. and Cohen, J. (1998) “Rubber hands ‘feel’ touch that eyes see,” Nature, 391(6669), pp. 756–756. Available at: https://doi.org/10.1038/35784.
- Buckner, C. (2018) “Empiricism without magic: Transformational abstraction in deep convolutional neural networks,” Synthese, 195(12), pp. 5339–5372. Available at: https://doi.org/10.1007/s11229-018-01949-1.
- Buckner, C. (2019) “Deep learning: A philosophical introduction,” Philosophy Compass, 14(10). Available at: https://doi.org/10.1111/phc3.12625.
- Campolo, A. and Crawford, K. (2020) “Enchanted determinism: Power without responsibility in artificial intelligence,” Engaging Science, Technology, and Society, 6, pp. 1–19. Available at: https://doi.org/10.17351/ests2020.277.
- Chalmers, A. (1999). The Bayesian Approach. In Curd, M., Cover, J.A. and Pincock, C. (2013) Philosophy of science: The central issues. second edition. New York, NY: W.W. Norton & Company.
- Dosovitskiy, A., Springenberg, J.T. and Brox, T. (2015) “Learning to generate chairs with Convolutional Neural Networks,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Preprint]. Available at: https://doi.org/10.1109/cvpr.2015.7298761.
- Gauker, C. (2011). Words and images: An essay on the origin of ideas. OUP Oxford.
- Gazzaniga, M.S. (2016) Who’s in charge?: Free will and the science of the brain. London: Robinson.
- IZHIKEVICH, E.M. (2007) Dynamical Systems in neuroscience: The geometry of excitability and bursting. Cambridge: MIT Press.
- Jeffrey R. Binder, Lisa L. Conant, Colin J. Humphries, Leonardo Fernandino, Stephen B. Simons, Mario Aguilar & Rutvik H. Desai (2016) Toward a brain-based componential semantic representation, Cognitive Neuropsychology, 33:3-4, 130-174, DOI: 10.1080/02643294.2016.1147426
- Jo, J., & Bengio, Y. (2017). Measuring the tendency of CNNs to Learn Surface Statistical Regularities. arXiv. https://doi.org/10.48550/arXiv.1711.11561
- Joyce, J. (2003) Bayes’ theorem, Stanford Encyclopedia of Philosophy. Stanford University. Available at: https://plato.stanford.edu/archives/spr2019/entries/bayes-theorem/ (Accessed: November 11, 2022).
- Karaca, K. (2021) “Values and inductive risk in machine learning modelling: The case of binary classification models,” European Journal for Philosophy of Science, 11(4). Available at: https://doi.org/10.1007/s13194-021-00405-1.
- Rescorla, M. (2020) The computational theory of mind, Stanford Encyclopedia of Philosophy. Stanford University. Available at: https://plato.stanford.edu/entries/computational-mind/#GodIncThe (Accessed: November 10, 2022).
- Rieke, F. et al. (1999) Spikes: Exploring the neural code. London: MIT Press.
- Ring, M. (2018) “Wittgenstein on essence,” Philosophical Investigations, 42(1), pp. 3–14. Available at: https://doi.org/10.1111/phin.12217.
- Sapolsky, R.M. (2018) Behave: The biology of humans at our best and worst. London: Vintage.
- Seth, A. K. (2015). The Cybernetic Bayesian Brain – From Interoceptive Inference to Sensorimotor Contingencies.In T. Metzinger & J. M. Windt (Eds). Open MIND: 35(T). Frankfurt am Main: MIND Group. doi: 10.15502/9783958570108
- Torralba, A. and Oliva, A. (2003) “Statistics of natural image categories,” Network: Computation in Neural Systems, 14(3), pp. 391–412. Available at: https://doi.org/10.1088/0954-898x_14_3_302.
- Wittgenstein, L. (2001) Philosophical investigations. Blackwell
- Zednik, C. (2019) “Solving the black box problem: A normative framework for explainable artificial intelligence,” Philosophy & Technology, 34(2), pp. 265–288. Available at: https://doi.org/10.1007/s13347-019-00382-7.