## Methodology

[6 abstracts]

1. **Conditional significance in the test of the difference between two proportions in small samples** [presentation, pdf, 1375 kB]

Gori F.,
Grassi M.

First author's affiliation: Dipartimento di Statistica, Probabilità e Statistiche Applicate, Università "La Sapienza", Roma, Italy

When hypotheses concerning difference between proportions are at stake, the significance test is performed on the variable p₁-p₂, which for sufficiently large samples approaches the normal distribution. The reliance on this test seems problematic when sample sizes, n₁ and n₂, are unequal, particularly as n becomes smaller and the values of parameter P are very different from 0.5. In addition, although the study of sampling with replacement is useful for theoretical purpose, in practice researchers rarely use extraction with replacement when small samples are used (as is often the case in psychological research). In this study, a (programmed) solution based upon the exact distribution of the two independent samples in their whole universe is illustrated. Final remarks about the conditional significance and the experimental model are carried out.

2. **As time goes by…** [presentation, pdf, 584 kB]

Wiedermann W.,
Gula B.

First author's affiliation: Department of Psychology, Klagenfurt, Austria

Observations obtained from reaction time (RT) experiments often violate the normality assumption underlying parametric significance tests. The present study compares several strategies that deal with non-normally distributed RT observations. In 2000 the Journal of Experimental Psychology: Human Perception and Performance published 385 experiments. In about 60% of these studies RTs were analyzed. The most common procedure was trimming (47%), ranging from 2 to 4 SDs. In 41% of the RT studies neither outlier analyses nor any distributional considerations were mentioned. This state is quite alarming because in case of non-normal data such analyses mostly fail to detect true differences. Other approaches such as transformation or nonparametric tests were rather uncommon. Based on our review, common criteria for trimming were chosen to compare the efficiency of this method with less common procedures. A Monte Carlo approach was used to simulate plausible RT observations based on twelve ex-Gaussian distributions (Miller, 1988). In order to investigate the two-sample problem, the t-test on raw RT scores was compared to (a) constant and adaptive trimming, (b) logarithmic and adaptive transformation, and (c) the non-parametric Wilcoxon-Mann-Whitney-test (WMW). Results reveal that all procedures are robust for the twelve distributions. However, the t-test on raw scores entails a great power loss, especially if distributions are extremely skewed (skewness: 2.09, kurtosis: 9.11). For these distributions the log-transformation, as well as the WMW-test were most powerful and an increasing amount of trimming enhances the power of the t-test for trimmed means. For less skewed distributions (skewness: 0.71, kurtosis: 4.54) choosing a transformation adaptively is slightly more powerful than the log-transformation and an increasing amount of trimming reduces the power of detecting true differences in trimmed means. Based on the characteristics of sample distributions, a detailed decision tree will be suggested to aid the choice of the most accurate procedure.

3. **Generation of Fat Tail Distributions** [presentation, ppt, 1347 kB]

Luccio R.

Dipartimento di Psicologia "G. Kanizsa", University of Trieste, Trieste, Italy

Several quite different phenomena distribute according to few different functions, which share in gross sense a particular shape that has induced to call them “fat (heavy, long) tail distributions”. Well-known examples are the so-called Benford’s law (originally stated by Newcomb, 1881), according to which the probability that the first digit in a series of statistical data is d is given by a log function of the digit. Other well-known examples are Bradford’s law (about the distribution of scientific journals), Heap’s law (vocabulary growth and text size), Lotka’s law (number of authors and number of contributions), and so on. In economics, Lorentz’ law and Pareto’s law (on inequality of incoming) are well known. In psycholinguistics the most celebrated is undoubtedly Zipf’s law, on relation between number of words and their rank of frequency, originally stated in 1925, a power (quasi-hiperbolic) law. According to Zipf, it could be explained in force of an economic psychological principle: more frequent are the words, more easily they come to consciousness. In this study I have investigated the alliterations, that is the relationship between number of words interposed between two words sharing the same first letter in the first syllable (x) and number of occurrences of each given x, that is n(x). Analysing different excerpts of texts of different authors (Italian, French and American novelists like D’Annunzio, Invernizio, France and James, or essayists like Leopardi), I found invariably an excellent fit to Lorentz’ law, with an R-squared always above .93, and a remarkable stability of parameters within each author, rather than between them. This induces to consider using this regularity in the studies on attribution of authorship. Some hypotheses about the generating mechanisms of such distributions are advanced.

4. **How early can we begin assessing g with Coloured progressive matrices?** [presentation, ppt, 612 kB]

Sočan G.,
Kavčič T., Zupančič M.

First author's affiliation: Department of Psychology, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia

Raven’s progressive matrices (RPM) are among the most frequently used tests of general intelligence. Coloured progressive matrices (CPM) were designed particularly for assessing cognitive abilities of preschool children. The specific feature of CPM, compared to other RPM tests, is that solving items requires processing like global pattern apprehension, mirroring, repetition etc., rather than analogical thinking and discovering abstract rules in proper sense. We shall present the results of a psychometric analysis of responses of 905 2-, 3- and 4-year old children. The results show an increase of both reliability and unifactorial homogeneity with age. Nevertheless, the items could be scaled according to the Rasch model reasonably well. The relation between item parameters and item content proved to be quite consistent. Finally, the implications of the results for the early assessment of intelligence shall be discussed. Possibilities of a further development of CPM or similar tests, respectively, shall also be outlined.

5. **An Italian validation study of the Buss-Perry Aggression Questionnaire (AQ)** [presentation, ppt, 227 kB] [short paper, doc, 189 kB]

Sommantico M.,
Osorio Guzmàn M., Parrello S., De Rosa B., Donizzetti A.

First author's affiliation: Dip. Scienze Relazionali "G. Iacono", Università degli Studi di Napoli "Federico II", Napoli, Italy

The main aim of this study was to analyse the psychometric properties of the pre-publishing Italian version of the Aggression Questionnaire (AQ) (Fossati, Maffei, Acquarini, Di Ceglie, 2003), considered one of the most useful self-report instruments to analyse aggressive behaviour in youths and adolescents, using a non clinical sample of students from different educational settings in the community of Naples (N = 860, 41% males & 59% females; 445 subjects attending secondary high schools & 415 subjects attending universities; average age = 20.10, SD = 3.70). It was necessary because the psychometric properties of this Italian version of the questionnaire were evaluated using a large clinical and non clinical sample of subjects only from north and centre Italy population. The results of the exploratory factor analysis, effectuated using a principal component analysis and omitting some items of the original scale, confirm the four-factors structure individuated by different authors, in different countries (Buss and Perry, 1992; Nakano, 2001; Morren, Meesters, 2002; von Collani, Werner, 2005; Gallardo-Pujol et al., 2006; Bouchard, 2007) with some specificities: while physical and verbal aggression represent the instrumental or motor component of the aggressive behaviour, and hostility as the cognitive component of the same, anger factor, as discussed, leads to a more complex interpretation. Instead, results of a successive confirmatory factor analysis, based on an exploratory factor analysis effectuated with principal axis factorisation, and omitting some items, show a three-factors structure.

6. **Development and bi-cultural validation of the new sexual satisfaction scale (NSSS)** [presentation, ppt, 218 kB] [short paper, pdf, 198 kB]

Stulhofer A.,
Busko V., Brouillard P., Kuljanic K.

First author's affiliation: Department of Sociology, University of Zagreb, Zagreb, Croatia

In this paper we present the development and bi-cultural validation of a new measure of sexual satisfaction. A review of the existing measures of sexual satisfaction demonstrated the need for a measure that would be theoretically grounded, focused on sexual satisfaction rather than the absence of sexual health difficulties, gender non-specific, equally useful in assessing sexual satisfaction among heterosexual and homosexual persons, among individuals in committed relationships and those who are not in a relationship, and valid across contemporary cultures. The New Sexual Satisfaction Scale (NSSS) is based on a 5-dimensional conceptual model, which emphasizes the importance of sexual activities, sexual exchange, sexual sensations, sexual presence/focusing, and emotional closeness for one's sexual satisfaction. Scale construction and validation were carried out using seven different samples, three student and four community samples (including one clinical sample), with over 2000 participants aged 18-60, surveyed in Croatia and the US. Factor analysis pointed to two underlying dimensions of sexual satisfaction: the ego-centered factor and the partner & sexual activity-centered factor. Reliability of the two subscales and the full NSSS (k = 20) was high in all independent samples (alpha coefficients ranged from .90 to .95). Further analysis confirmed construct validity of the scale in both cultures. The NSSS was also found to have satisfactory one-month temporal stability. A short version of the NSSS (SNSSS; k = 12), which included items from all five theoretically implied dimensions, demonstrated reliability and validity comparable to the full scale.