• Survey on statistical methods and assumptions

  • Dear researcher

    Thank you in advance for taking the time to complete this survey.  Your answers will help us better understand which statistical methods do researchers in your field usually use. In the following survey you will be presented with a list of methods and assumptions. For each method, we will ask you how commonly it occurs in your field and which assumptions you would expect to be tested.

    We also ask you to write down various considerations regarding the assumptions. For example:

    - "I usually do not expect authors to check for normality while performing t-test as this test known to be robust to normality assumption violation"

    - "I know that homogeneity of variance is a prerequisite for ANOVA, but we never report on this in our field"

    - "I do not expect authors to report on normality while performing linear regression as this is implied that they have checked for this before applying the method. Therefore due to length limitation of papers it is uncommon to report on this"

    Your responses, along with those from other participants, will be used only for scholarly purposes and will be kept completely confidential.

    Thank you again for your time and participation.


  • Next, we will ask you a few questions about some statistical methods' assumptions. Every page will be dedicated to one method that appears on the top of the screen. Furthermore, it includes short descriptions of the corresponding method assumptions.

  • METHOD: Unpaired (Two sample) t-test

  • Normality - an assumption that the underlying random variable of interest is distributed normally, within each group.

  • ____________________________________________________________________________

  • Equal variances  - in some types of t-tests there is an assumption that the variance of variable of interest (e.g. salary) for each of the compared groups (e.g. different countries) is similar.

  • ____________________________________________________________________________

  • Independence - assumption meaning that the value of one observation does not affect the value of other observations. The assumption of independence means that your data isn’t connected/nested/clustered in any way (at least, in ways that you haven’t accounted for in your model).

  • ____________________________________________________________________________

  • METHOD: ANOVA (Analysis of Variance)

  • ____________________________________________________________________________

  • Normality - an assumption that the underlying random variable of interest is distributed normally, within each group.

  • ____________________________________________________________________________

  • Equal variances  - an assumption that the variance of variable of interest (e.g. salary) for each of the compared groups (e.g. different countries) is similar.

  • ____________________________________________________________________________

  • Independence - meaning that the value of one observation does not affect the value of other observations. The assumption of independence means that your data isn’t connected/nested/clustered in any way (at least, in ways that you haven’t accounted for in your model)

  • ____________________________________________________________________________

  • METHOD: LINEAR REGRESSION

  • ____________________________________________________________________________

  • Normality - assumption means that the residuals of the dependent variable(s) are normally distributed

  • ____________________________________________________________________________

  • Equality of Variance - assumption means that the variance around the regression line is the same for all values of the predictor variables.

  • ____________________________________________________________________________

  • Linearity - an assumption that means that the mean of the response variable is a linear combination of the parameters (regression coefficients) and the predictor variables.

  • ____________________________________________________________________________

  • Multicollinearity - an assumption means that variables are not highly linearly dependent on each other. 

  • ____________________________________________________________________________

  • Independence - meaning that the value of one observation does not affect the value of other observations. The assumption of independence means that your data isn’t connected/nested/clustered in any way (at least, in ways that you haven’t accounted for in your model)

  • ____________________________________________________________________________

  • METHOD: Logistic Regression

  • Independence - meaning that the value of one observation does not affect the value of other observations. The assumption of independence means that your data isn’t connected/nested/clustered in any way (at least, in ways that you haven’t accounted for in your model)

  • ____________________________________________________________________________

  • METHOD: Chi-Square

  • ____________________________________________________________________________

  • Independence - meaning that the value of one observation does not affect the value of other observations. The assumption of independence means that your data isn’t connected/nested/clustered in any way (at least, in ways that you haven’t accounted for in your model)

  • ____________________________________________________________________________

  • Expected Cell Count - Expected counts are the projected frequencies in each cell if the null hypothesis is true (aka, no association between the variables). For instance:

    Given the following 2x2 table of outcome (O) and exposure (E) as an example, a, b, c, and d are all observed counts:

    Outcome/Exposure Table

    The expected count for each cell would be the product of the corresponding row and sum totals divided by the sample size. For example, the expected count for O+E+ would be:

    [(a+b)×(a+c)] / [a+b+c+d]

     

  • ____________________________________________________________________________

  • Please share with us any thoughts you have on the importance of checking for assumptions of statistical methods such as those we presented above. Your response can address (but not limited) the following questions:  

  • When do you think statistical assumptions should  be checked?

  • Please provide reasons why requiring researchers to report assumption checking in published works would be problematic, if you know of any. 

  • How does your community usually deal with statistical validity threats?

  • ____________________________________________________________________________

  • ____________________________________________________________________________

  • Should be Empty: