Digital Contextualization

Sociologists like Pierre Bourdieu advocated the need of a relational type of analysis on data: Statistics, scores and answers to a questionnaire should be considered to relate individuals and highlight hidden complex factors.Data collected about individuals need to be contextualized relatively to other individuals, implicit social environment and culture at large. Correspondence Analysis based on Singular value decomposition (SVD) has been successful in contextualizing individual data to other individuals. Latent Semantic Analysis also based on SVD was successful in contextualizing words based on their usage context. These methods have been expanded in several directions. Partial Least Squares path modelling (PLS­pm) analysis allows the analysts to test hypotheses against data by integrating a simulation process. Latent Dirichlet Allocation has provided a probabilistic alternative to vector LSA. Along with these efficient numerical approaches, discrete approaches relying on the increasing computer power have explored non­frequentist approaches. These Formal Concept Analysis based on Galois Lattices allow highlighting and relate complex underlying concepts. More specifically, Information Visualisation based on graph algorithms like PathFinder and Social Network indicators have allowed domain and topic mapping from raw text. Further on, automatic summarization approaches combined with Information Retrieval approaches led to methods that can highlight the implicit context of a short message giving a large and reliable encyclopedic resource like the WikiPedia. Recently, Deep Learning based on Word Embedding approaches handle contextualization based on very large data sources. Pierre Bourdieu was limited by two obstacles: the power of computers that at the time did not allow him to explore all correlations and the cost of data digitization and the contextualization by correlation analysis could only be done at the initiative of the analyst and according to the choice made. However the digital world of the 21st century reversed this paradigm. Automatic contextualisation of our every action is sustained. Finally the removal of these two technical bottlenecks raises questions, data rights and other ethical issues. The special issue would include state of the art, automatic contextualization methods and would put into perspective representative case studies of these approaches. Each article would be reviewed by a multidisciplinary committee and will include thorough reviews by both sociologists and computer scientists or statisticians.

1. Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

Fionn Murtagh ; Mohsen Farid.
The relevance and importance of contextualizing data analytics is described. Qualitative characteristics might form the context of quantitative analysis. Topics that are at issue include: contrast, baselining, secondary data sources, supplementary data sources, dynamic and heterogeneous data. In geometric data analysis, especially with the Correspondence Analysis platform, various case studies are both experimented with, and are reviewed. In such aspects as paradigms followed, and technical implementation, implicitly and explicitly, an important point made is the major relevance of such work for both burgeoning analytical needs and for new analytical areas including Big Data analytics, and so on. For the general reader, it is aimed to display and describe, first of all, the analytical outcomes that are subject to analysis here, and then proceed to detail the more quantitative outcomes that fully support the analytics carried out.

2. Active learning in annotating micro-blogs dealing with e-reputation

Jean-Valère Cossu ; Alejandro Molina-Villegas ; Mariana Tello-Signoret.
Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis […]