Search Results

Now showing 1 - 4 of 4
  • Item
    Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
    (San Francisco, CA : Public Library of Science (PLoS), 2013) Amancio, D.R.; Altmann, E.G.; Rybski, D.; Oliveira Jr., O.N.; da Costa, L.F.
    While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
  • Item
    Global gridded crop model evaluation: Benchmarking, skills, deficiencies and implications
    (München : European Geopyhsical Union, 2017) Müller, Christoph; Elliott, Joshua; Chryssanthacopoulos, James; Arneth, Almut; Balkovic, Juraj; Ciais, Philippe; Deryng, Delphine; Folberth, Christian; Glotter, Michael; Hoek, Steven; Iizumi, Toshichika; Izaurralde, Roberto C.; Jones, Curtis; Khabarov, Nikolay; Lawrence, Peter; Liu, Wenfeng; Olin, Stefan; Pugh, Thomas A.M.; Ray, Deepak K.; Reddy, Ashwan; Rosenzweig, Cynthia; Ruane, Alex C.; Sakurai, Gen; Schmid, Erwin; Skalsky, Rastislav; Song, Carol X.; Wang, Xuhui; de Wit, Allard; Yang, Hong
    Crop models are increasingly used to simulate crop yields at the global scale, but so far there is no general framework on how to assess model performance. Here we evaluate the simulation results of 14 global gridded crop modeling groups that have contributed historic crop yield simulations for maize, wheat, rice and soybean to the Global Gridded Crop Model Intercomparison (GGCMI) of the Agricultural Model Intercomparison and Improvement Project (AgMIP). Simulation results are compared to reference data at global, national and grid cell scales and we evaluate model performance with respect to time series correlation, spatial correlation and mean bias. We find that global gridded crop models (GGCMs) show mixed skill in reproducing time series correlations or spatial patterns at the different spatial scales. Generally, maize, wheat and soybean simulations of many GGCMs are capable of reproducing larger parts of observed temporal variability (time series correlation coefficients (r) of up to 0.888 for maize, 0.673 for wheat and 0.643 for soybean at the global scale) but rice yield variability cannot be well reproduced by most models. Yield variability can be well reproduced for most major producing countries by many GGCMs and for all countries by at least some. A comparison with gridded yield data and a statistical analysis of the effects of weather variability on yield variability shows that the ensemble of GGCMs can explain more of the yield variability than an ensemble of regression models for maize and soybean, but not for wheat and rice. We identify future research needs in global gridded crop modeling and for all individual crop modeling groups. In the absence of a purely observation-based benchmark for model evaluation, we propose that the best performing crop model per crop and region establishes the benchmark for all others, and modelers are encouraged to investigate how crop model performance can be increased. We make our evaluation system accessible to all crop modelers so that other modeling groups can also test their model performance against the reference data and the GGCMI benchmark.
  • Item
    Emotional tendencies in online social networking: a statistical analysis
    (London : Taylor & Francis Open, 2016) Zhang, Xianhan; Zhang, Nan; Zhao, Letong; Zhang, Ruihan; Cao, Jinde; Lu, Jianquan; Kurths, Jürgen; Qian, Cheng
    Numerous previous studies suggested that people's emotional tendency (ET) towards an issue can often be affected by others. But in some cases, people are unwilling to believe opposite points. This paper aims to study whether people's emotional tendencies (ET) are susceptible with exposures to others' ET concerning a special topic. ET contained in 798,057 pieces of private-information-deleted Chinese Weibo posts are carefully investigated via a revised genetic algorithm, a nonlinear method. Note that nearly all of the posts are closely related to a special topic, the terrible earthquake happen in Japan, 11 March 2011. By conducting statistical analysis including coefficient calculations and hypothesis testing, this study shows that concerning this particular topic, Chinese citizens' first impressions about Japan are solid enough to form their ET and would not be easily altered. Moreover, according to analysis and discussion, we discover that node-to-node impact is exaggerated in some theoretical information diffusion models. Instead it is actually the interaction between nodes' properties and the spread information that matters in the process of information diffusions.
  • Item
    Communication activity in a social network: Relation between long-term correlations and inter-event clustering
    (London : Nature Publishing Group, 2012) Rybski, D.; Buldyrev, S.V.; Havlin, S.; Liljeros, F.; Makse, H.A.
    Human communication in social networks is dominated by emergent statistical laws such as non-trivial correlations and temporal clustering. Recently, we found long-term correlations in the user's activity in social communities. Here, we extend this work to study the collective behavior of the whole community with the goal of understanding the origin of clustering and long-term persistence. At the individual level, we find that the correlations in activity are a byproduct of the clustering expressed in the power-law distribution of inter-event times of single users, i.e. short periods of many events are separated by long periods of no events. On the contrary, the activity of the whole community presents long-term correlations that are a true emergent property of the system, i.e. they are not related to the distribution of inter-event times. This result suggests the existence of collective behavior, possibly arising from nontrivial communication patterns through the embedding social network.