Monthly Seminars | UNSW Mark Wainwright Analytical Centre

Informal, general interest seminars on topics in statistical data analysis

2025 - April

Don’t Ignore It! A Pattern Mixture Modelling Approach to Analysing Longitudinal Data with Non-Ignorable Missingness

Longitudinal studies often have missing data due to participants being lost to follow up. We often consider these missing data ignorable (Missing Completely At Random or Missing At Random) and analyse them using methods that produce unbiased estimates of the effect we are assessing. However, there are often good reasons why the missing data are not ignorable. This seminar will demonstrate one approach to estimating a treatment effect in longitudinal data with data that are Missing Not At Random.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 17 April 2025

Time: 2.30pm - 3.30pm

Speaker: A/Professor Nancy Briggs, UNSW Stats Central

Delivery Mode: Hybrid

Location: K-G27-G06 – AGSM Colonial Theatre, room G06

Slides - available soon!

2025 - March

Trust or Confidence: Using Confidence Intervals

Despite the American Statistical Association releasing a statement almost 10 years ago saying that "Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold", in practice, still some conclusions and decisions are being based on a single value (usually a significance level of 0.05).The talk will cover the importance of confidence intervals in making conclusions, rather than relying only on p-values. It will include examples of confidence interval performance.

The talk will be about 40 minutes long and following by discussion.

Date: 27 March 2025

Time: 2.30am - 3.30pm

Delivery Mode: Hybrid

Location: AGSM Pioneer International Theatre G04

Speaker: Luz Palacios-Derflingher, Biostatistician, UNSW Stats Central

Slides

2025 - February

But What If? A Brief Foray Into Causal Inference

In experiments that aim to determine whether a treatment has an effect or not, randomised controlled trials (RCTs) are considered a gold standard. When RCTs are not possible due to practical or ethical constraints, observational data is often relied on instead. However, observational data is often wrought with problems such as confounding, which motivates the use of various causal inference methods.

In this seminar, we begin by introducing and defining causal effects using Rubin’s potential outcomes framework. We then compare RCTs and observational data, and explore three causal inference methods: G-computation, Inverse Probability of Treatment Weighting, and Targeted Maximum Likelihood Estimation. We demonstrate how these methods address confounding, and also discuss their limitations.

The talk will be about 40 minutes long and following by discussion.

Date: 27 February 2025

Time: 2.30pm - 3.30pm

Delivery Mode: Hybrid

Location: K-G27-G06 - AGSM Colonial Theatre G06

Speaker: Nickson Ning, Statistical Consultant, UNSW Stats Central

Slides

2024 November

Risky Business: Obtaining Estimates of Relative Risk

When analysing prospective studies with a binary outcome the two most commonly used effect measures are odds ratios and relative risk. Although estimates of odds ratios are easy to obtain via logistic regression, they are often difficult to interpret because they relate only indirectly to the quantity of interest, the change in risk due to exposure to an intervention or risk factor. Relative risk, or risk ratios, provide a more natural measure of this change.

Unfortunately, the direct estimation of relative risk, via log binomial regression, faces technical difficulties. As a result, many alternative methods have been suggested to obtain relative risk estimates. This can lead to confusion about which method should be used and whether other choices are appropriate. In this seminar, we will discuss several of the available options, demonstrate their use, and determine how they perform in practice in an attempt to alleviate the confusion.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 21 November 2024

Time: 2.30pm - 3.30pm

Speaker: Peter Humburg, Senior Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: K-G27-LG07 - AGSM LG07 John B Reid Theatre

Slides

2024 - October

Space (doesn't have to be) the Final Frontier: Some Tips in Accounting for Spatial Dependence within Your Regression-Type Models in R

If your data has some reference to where it was collected, then you can probably consider it to be “spatial” in some way. Why does that matter? Observations/measurements collected closer to one another are often more closely related. When modelling your variable of interest, this spatial dependence usually needs to be accounted for to make the model useful for things like prediction or inference. This can even be true for models that boast a healthy suite of predictor variables.

In this seminar I will use examples to demonstrate how we can account for spatial dependence (and when we might want to) within regression-style models. I do this by including the dataset’s spatial information within (generalised) additive/mixed models. If this sounds daunting, fear not. In practice, these models are just a straightforward extension of the humble linear regression model – representing one small step for applied researchers (even if a giant leap for math-kind)!

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 24 October 2024

Time: 2.30pm - 3.30pm

Guest Speaker: Elliot Dovers, Research Associate, School of Mathematics and Statistics

Delivery Mode: Online

Location: E19 Patricia O'Shane G04 (previously CLB 3) - You are welcome to attend in person to watch the seminar online and meet Stats Central team

Slides

Video Recording: unavailable this time, our apologies!

2024 - Steptember

Mastering Meta-Analysis: Key Assumptions and Techniques for High-Quality Outcomes

To deliver synthesized evidence for clinical practice, meta-analysis is a valuable tool that has become more accessible thanks to the development of various software packages. However, it is crucial to understand the assumptions underlying meta-analysis and how to evaluate them to achieve high-quality results. This seminar will explore these assumptions and offer recommended methods for assessing their impact on your meta-analysis outcomes.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 26 September 2024

Time: 2.30pm - 3.30pm

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: K-G27-LG07 - AGSM LG07

Slides

2024 - August

The Use of Statistical Software Packages for Research and Teaching

The presentation will compare popular packages including SPSS, STATA, R (RStudio), Python, and SAS. Additionally, we will explore the use of open-source GUIs for teaching and research as a free, easy to use alternative to the commercial or code-based packages. In particular, the talk will introduce jamovi, an R-based GUI which has been integrated into statistical teaching at The University of Wollongong (UOW).

The seminar will discuss the pros and cons of each package, considering factors such as ease of use, versatility, and available support. Attendees will also receive guidance on how to access these tools and evaluate which options are best suited for their research and teaching needs. This session aims to help researchers and educators make informed decisions about the most appropriate software for their work. This talk was developed in conjunction with Prof Marijka Batterham, and Prof Alberto Nettel Aguirre from UOW.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 29 August 2024

Time: 2.30pm - 3.30pm

Guest Speaker: Dr. Brad Wakefield, Statistical Consultant, UOW Statistical Consulting

Delivery Mode: Online

Slides

2024 - July

Variable Selection: Making the Right Choice

Have you ever wondered, "Which variables should I really include in my model?" Have you, then, ever been tempted to exclude the non-significant predictors from a model fit, and refit the model using only the significant predictors? Unfortunately, that is naïve.

There are various proposed methods for tackling the problem of variable selection, but unfortunately not all of them have solid theoretical justification. In this talk, we introduce the problem of variable selection from a statistician’s perspective. We discuss various methods for performing variable selection, and provide guidelines and recommendations on numerous do’s and don’t’s when discerning predictors of interest for your research question.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 25 July 2024

Time: 2.30pm - 3.30pm

Speaker: Nickson Ning, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: Matthew Theatre D

Slides

2024 - June

Break Free from Independence: Mixed Models for Dependent Data

Simple statistical methods (t-tests, chi-square tests, linear models/regression) assume independence of observations and cannot be used when dependence is present in the sample. Mixed models are extensions of linear models to dependent data. Common reasons for dependence are

Clustering e.g. multiple patients per hospital, multiple plants per site, multiple measurements on the same individual
Time and Space - measurements taken close in time and space are more similar

In this seminar we will introduce mixed models as a straightforward extension of (generalised) linear models, and briefly cover how to fit, check, interpret and communicate results from mixed models.

If you would like to expand further skills in "mixed models", you can register for Mixed Models course July 1-2

The talk will be about 40 minutes long and following by discussion.

Date:Thursday 20 June 2024

Time: 2.30pm - 3.30pm

Speaker: Goradan Popovic, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: Wallace Wurth G16 Hybrid Lab

Slides

2024 - May

Adaptive study designs in clinical (and other) research

Adaptive study designs allow for changes to certain elements of the methodology in response to accumulating data, and may increase the efficiency of your research. In this seminar, we will introduce adaptive study designs, discuss possible adaptations as well as aspects that must be considered when employing such a design. We'll explore some examples of adaptive designs being used in clinical research, although the statistical principles are applicable in any area.

The talk will be about 40 minutes long including discussion.

Date: Thursday 30 May 2024

Time: 2.30-3.30pm

Guest Speaker: Mark Donoghoe, Biostatistician, Medicine & Health Clinical Research Unit

Delivery Mode: Hybrid

Location: Wallace Wurth G16 Hybrid Lab

Slides

2024 - April

A Top stats errors to look out for when reading, writing and reviewing papers

Poor statistical practice can distort scientific findings and mislead readers. This presentation reviews common statistical problems that readers, authors and reviewers of journals articles should be aware of when assessing research. These include p-hacking, multiple comparisons issues, underpowered studies, correlation/causation confusion, and inappropriate statistical procedures. I discuss the methodological flaws and reasoning behind these errors, as well as ways researchers fall prey to invalid practices. Recommendations are made for identifying questionable research practices in publications. Learn to recognize what makes for credible versus misleading uses of statistics in scientific literature.

Some useful references:

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31(4), 337-350.

‘Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations.Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values. The American Statistician, 70(2), 129-133.

‘Let us be clear. Nothing in the ASA statement is new. Statisticians and others have been sounding the alarm about these matters for decades, to little avail. We hoped that a statement from the world's largest professional association of statisticians would open a fresh discussion and draw renewed and vigorous attention to changing the practice of science with regards to the use of statistical inference.’Andrade C. HARKing, cherry-picking, P-hacking, fishing expeditions, and data dredging and mining as questionable research practices. J Clin Psychiatry. 2021;82(1):20f13804. Sun, Guo-Wen, Thomas L. Shook, and Gregory L. Kay. "Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis." Journal of clinical epidemiology 49.8 (1996): 907-916.

‘The use of bivariable selection [a.k.a. univariate analysis] for selecting variables to be used in multivariable analysis is inappropriate despite its common usage in medical sciences’. – 951 citationsKent, P., Cancelliere, C., Boyle, E. et al. A conceptual framework for prognostic research. BMC Med Res Methodol 20, 172 (2020).

This paper has a great overview of variables selection methods to use instead of univariate/multivariable, including how to deal with variable selection uncertainty and a discussion on post selection inference. Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162:55-63.

The talk will be about 40 minutes long including discussion.

Date: Thursday 18 April 2024

Time: 2.30-3.30pm

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Delivery Mode: Online

Slides

2024 - March

Transforming Data: Principles for Effective Data Visualisation

We often ask clients and collaborators to visualise their data at Stats Central. But how do we this in a way that everyone involved is satisfied with the visualisations? This talk will describe general principles for data visualisation from a statistician’s viewpoint, before exploring a variety of plots from the wild and evaluating them with these guidelines. I will finish with how these principles can be “bent” to better tell the story your data has to share.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 21 March 2024

Time: 2.30-3.30pm

Speaker: David Chan, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: Wallace Wurth G16 Hybrid Lab

Slides

2024 - February

50 Shades of NA: Making the Most of Your Missing Data

Missing data is a pervasive issue in empirical research across disciplines. Values can be missing for a variety of reasons, including nonresponse, loss to follow-up, combining data from different sources and skipped questions on surveys. The pattern and mechanisms behind missing data have important implications for statistical analysis and interpretation. This talk will provide an accessible introduction to missing data terminology and theory for those with a limited statistical background. I will explain techniques to account for missing data in common regression contexts, including confidence interval estimation, hypothesis testing and model selection.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 29 February 2024

Time: 2.30-3.30pm

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Location: K-J17 Ainsworth Building, Room 102

Slides: To be available soon!

2023 - November

Regression Assumptions 2: Multicollinearity and you

It is important to check the assumptions underlying your statistical data analysis. For regression models, most assumptions can be assessed by inspecting residual plots. In this seminar, we will focus on one assumption of multiple regression that requires a different approach: no multicollinearity between predictors. In the presence of multicollinearity, parameter estimates from your regression model become unreliable, which may lead to erroneous conclusions. We will discuss what multicollinearity is, how to detect it, and strategies to deal with the problem.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 23 November 2023

Time: 2.30-3.30pm

Speaker: Peter Humburg, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Note: this seminar has been moved from October.

Slides

2023 - September

A Practical Tutorial on Checking Regression Assumptions

Statisticians advise that regression assumptions should be checked by plotting residuals to look for violations, rather than using tests of assumptions. This seminar will teach you how to interpret residual plots, and what to do when assumptions are not met. We will look at what good and bad residual plots look like, talk about what to look for, and test ourselves on examples to help you build an intuition.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 28 September 2023

Time: 2.30-3.30pm

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Delivery Mode: Hybrid

Slides

2023 - August

How to Predict an Outcome: a Brief Introduction to Prediction

The desire for predictive ability arises in many areas, from who will win the FIFA Women’s world cup to predicting heart disease. However, how we approach modelling for prediction versus inference is very different. In this seminar we will discuss this difference, a broad overview of methods for prediction and how to measure predictive accuracy.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 17 August 2023

Time: 2.30-3.30pm

Speaker: Maeve McGillycuddy, Statistical Consultant, UNSW Stats Central

Delivery Method: Hybrid

Slides

2023 - July

That’s a Strange One! How to Identify and Deal with Outliers and Influential Data Points

Everyone has found an “extreme” value in their data at one time. Assuming the data point is valid, how should you address it? This seminar will discuss identification of “outliers” and their influence on results, discuss remedies (some good and not so good) and provide resources for dealing with those extreme values.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 20 July 2023

Time: 2.30-3.30pm

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Hybrid: Location - K-J17-G02 Ainsworth Building G02 MAP (online option)

Slides

2023 - June

It All Depends… Analysing Dependent Observations

In several disciplines, we want to evaluate the association between an outcome variable and independent variable(s). In some designs we have nested data, that is, several observations/measurements within groups (clusters), and analyses need to take this into account. I will give an overview of the effects of not taking the nature of this type of data into account via examples.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 29 June 2023

Time: 2.30-3.30pm

Speaker: Luz Palacios-Derflingher, Biostatistician, UNSW Stats Central

Hybrid: Location - K-J17-G02 Ainsworth Building G02 MAP (online option)

Slides

2023 - May

Initial Data Analysis: The Missing Link in Statistical Analysis Plans

Typically, the majority of data analysis time is spent in the initial data analysis phase and this phase is critical to the validity of the statistical analysis. Initial data analysis is broadly all data analysis prior to that which answers the research question. For example: outlier and influential point detection; missing data treatment; decisions about transformations; correlation structures; variable inclusion and assessing model assumptions and impacts the final statistical model chosen in many ways. It’s a hidden reality that we often find ourselves cycling iteratively between initial data analysis and statistical analysis until we ‘get the model right’. Recently (1, 2) there have been calls and discussions on how to make initial data analysis a more rigorous part of a statistical analysis plan. I will demonstrate practical ways we can go about doing this.

The talk will be about 40 minutes long and following by discussion.

Date: Thursday 18 May 2023

Time: 2.30-3.30pm

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Location: Hybrid

Room: AGSM building, K-G27-G04 Pioneer International Theatre MAP

Slides

2023 - April

A Few Dos and Don’ts of Multivariable Regression Modelling

Multivariable regression modelling is a vast field with a huge number of approaches that can help to address various research questions. In this seminar we will discuss some broad strategies to use and to avoid when planning, undertaking, and interpreting the results of your modelling endeavour.

The talk will be about 40 minutes long including discussion.

Date: Thursday 27 April 2023

Time: 2.30-3.30pm

Speaker: Mark Donoghoe, Statistical Consultant, UNSW Stats Central

Location: K-E19-G03 - Central Lecture Block 2 (online option)

Slides for the presentation are HERE

2023 - March

Practical Study Design

Good study design is crucial for answering your research questions. No amount of post processing or statistical expertise can compensate for poor or inadequate study design. I will review basic concepts like confounding, controls and randomization; show you how to estimate an appropriate sample size for your question; talk about how to conduct good observational studies; and describe how to use blocking in your study design, so that you can get more power with a smaller sample size.

The talk will be about 40 minutes long including discussion.

Date: Thursday 30 March 2023

Time: 11am-12pm

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2023 - February

But It Isn't Significant! What To Do with Large P-Values

Have you run a study and are now faced with the fact that the p-value for the effect of interest isn't significant? We will take a look at what you can do in a situation like this to still get the most out of your data and maybe even publish your results. Quantifying the amount of evidence supporting the conclusion that the effect of interest does not exist will help you understand what your data is trying to tell you. We will discuss strategies to achieve this and look at what you can do to get such results published.

The talk will be about 30 minutes long including discussion.

Date: Thursday 23 February 2023

Time: 2.30-3.30pm

Speaker: Peter Humburg, Biostatistician, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - November

Informing Statistical Practice and Some Diversions

In academia we are striving for quality research. In this talk I will point out the importance of: i) having aims, research questions and statistical planning connected, ii) involving a (bio)statistician from the beginning of research to better understand the research question and what needs to be done.

I will also point out: iii) that statistical research does not involve answering "a quick question" most of the time, iv) that the correct use of statistical analyses depends on understanding the research question, v) some of the pitfalls that I have encountered.

The talk will be about 30 minutes long including discussion.

Date: Thursday 24 November 2022

Time: 2.30-3.30pm

Speaker: Luz Palacios-Derflingher, Biostatistician, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - October

Five Steps Towards Making Your Analyses Transparent and Reproducible (in R)

Ideally, analyses from our scientific papers would be both transparent (readers know what you did) and computationally reproducible (anyone can reproduce key results from the data). Recognising the benefits of reproducible science to themselves and society, many individuals now make code underpinning their results available with a paper. But for others, making code available seems like a lofty distant goal. In this talk, we outline five easy steps you can take to making your work more reproducible. Importantly, you can apply and benefit from these, whether you plan to make code publicly available or not. We use them in every project and teach them to students, as they make our analyses more robust and reliable and collaboration easier. We'll also discuss some common barriers to making analyses reproducible and my solutions to these, where we have one!

The talk will be about 30 minutes long including discussion.

Date: Thursday 27 October 2022

Time: 2.30-3.30pm

Guest Speaker: Associate Professor Daniel Falster, Evolution & Ecology Research Centre, School of Biological, Earth and Environmental Sciences

Location: Online

Slides for the presentation are HERE

2022 - September

What Can a Caterpillar Teach Us About Good Data Visualisation?

As researchers, we try to communicate clearly and effectively our thoughts and ideas through presentations and publications. We are storytellers and our research produces stories. This seminar will borrow from children's literature to illustrate how we can be more effective in presenting our stories visually.

Caterpillar

The talk will be about 30 minutes long including discussion.

Date: Wednesday 21 September 2022

Time: 2.30-3.30pm

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - August

Ordinal Variables and What to Do with Them

Likert scales, disease severity and body condition or condition of an ecosystem or organism are all examples of variables that may consist of a series of categories that have an order, e.g. Low/Medium/High, but are difficult to quantify further. We can use the cumulative link model, an extension of logistic regression, to analyse these variables and their relationships with other variables. Using examples, this seminar will explore the motivation for and interpretation of this and other ordinal models and explore data visualization for ordinal variables.

The talk will be about 30 minutes long including discussion.

Date: Thursday 25 August 2022

Time: 2.30-3.30pm

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - July

How Can You Get Closer to a RCT When You Can't Do an RCT?

There are many situations where it is unethical or impractical to set up a randomised controlled trial (RCT). In such situations, researchers typically rely on observational studies. However, these are traditionally considered to be quite limited in causal inference, as they are notorious for the treatment or exposure being dependent on other variables, leading to confounding.

This talk will focus on inverse probability of treatment weighting (IPTW) in longitudinal observational studies. This technique can be used to improve the estimates of treatment effects, by balancing baseline covariates between groups, and minimising confounding by treatment indication. These methods are relatively simple to deploy. Finally, they can also be used to improve precision in RCT’s.

The talk will be about 30 minutes long including discussion.

Date: Thursday 28 July 2022

Time: 2.30-3.30pm

Speaker: Nicholas Olsen, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - June

Making Heads and Tails of Randomisation

Randomisation is a valuable tool for examining the causal effect of an intervention. In this seminar we will discuss the practical aspects of randomisation: why we do it, how to do it (and how not to!) and how to analyse data from a randomised study.

The talk will be about 30 minutes long including discussion.

Date: Thursday 30 June 2022

Time: 2.30-3.30pm

Speaker: Mark Donoghoe, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

Sorry NO video recording available!

2022 - May

Where does machine learning lie within the field of health data science?

I think you will agree that machine learning and artificial intelligence have become ubiquitous terms for concepts that promise to change our lives. Many believe that we can apply machine learning to solve any problem related to data; and what is more, that machines can learn without any human intervention. Could such things be true? What exactly is machine learning? Are there any practical machine learning applications in the health sector?

This presentation takes participants along a journey to understand where machine learning techniques and other techniques lie within health data science frameworks.

The talk will be about 30 minutes long including discussion.

Date: Thursday 26 May 2022

Time: 2.30-3.30pm

Guest Speaker: Dr Oscar Perez-Concha, Centre for Big Data Research in Health, UNSW Sydney

Location: Online

Slides for the presentation are HERE

2022 - April

Practical Study Design

The talk will be about 40 minutes long including discussion.

Date: Thursday 28 April 2022

Time: 2.30-3.30pm

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2022 - March

A Practical Guide on the Handling and Reporting of Missing

Missing data are commonly encountered in medical research and other research fields. With a brief introduction about the cause and mechanism of missingness, the talk will focus on the selection of strategies to handling missing data, and how we report the analysis with missing data present.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Zhixin Liu, Statistical Consultant, UNSW Stats Central

Date: Thursday 24 March 2022

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2022 - February

When Your Research Meets a Global Pandemic

Just about everything we do as researchers has been affected by COVID-19 – Still! Distancing measures has made it difficult for some people to pursue their research. This talk will highlight some of the problems the consultants at Stats Central have encountered when helping people since the pandemic began. I will discuss some of the issues that may – or may not! – lead to changes in your study, and provide updated resources for researchers.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Date: Thursday 17 February 2022

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - December

2021 in Graphs

This has been an year unlike any other, well, except last year. It has been a year of hardship but also a year of opportunity and unexpected trends. In this talk the Stats Central team will channel Alan Kohler to collate a set of funky graphs that take a light-hearted, almost inappropriately upbeat, look at The Year That Was. This will also be an opportunity to explore different techniques for visualising data and presenting ideas in creative ways.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Professor David Warton, Director of UNSW Stats Central

Date: Thursday 9 December 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2012 - November

How to avoid p-hacking, HARKing and other statistical sins

Most published research findings are not reproducible ( 'the reproducibility crisis' ). This is a polite way to say these research findings are wrong. A set of related practices called p-hacking or HARKing as well as cherry-picking, fishing expeditions, data dredging or data mining have been named as a major cause of the reproducibility crisis in research.

These practices are incredibly common, and usually carried out by well-meaning researchers wanting to extract as much information from their data as possible, and not realising they are doing the wrong thing. Most worryingly they are often taught in statistics courses. In this seminar I will describe what these practices are, why they lead to bad research, and how to (very easily) avoid them.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Date: Thursday 18 November 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - October

Bootstrapping: How to use data to get out of a tight spot

Statistical analyses often rely on assumptions about the underlying data. In many situations, it is questionable whether these are justified. Instead of blindly trusting the assumptions bootstrapping makes the data work harder to help us obtain reliable results. This seminar will discuss how the bootstrap procedure works and how you can use it for your own data analyses.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Peter Humburg, Statistical Consultant, UNSW Stats Central

Date: Thursday 21 October 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - September

Four strategies for dealing with multiple comparisons

Multiple hypotheses may be generated by multiple treatment arms; heterogeneous treatment effects; or measuring multiple outcome variables. In a hypothesis testing framework, using p <0.05 as a criterion for declaring significance, it can be easy to get spurious results when many hypotheses are tested. This talk will discuss 4 things you can do when faced with multiple comparisons- covering the difference between controlling the family-wise error rate and the false discovery rate; the Bonferroni-Holm adjustment; the Benjamini-Hochberg adjustment; strategies for multiple outcome variables and strategies for correlated multiple comparisons.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Date: Thursday 23 September 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - August

A Gentle Introduction to Survival Analysis

When we are interested in the length of time until an event first occurs, we might not be able to observe the individuals for long enough to see them all have the event. This phenomenon, known as censoring, can be handled using survival analysis methods. In this seminar, I will introduce the basic concepts and methods used in survival analysis, and give examples of some of its many extensions for dealing with more complicated scenarios.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Mark Donoghoe, Statistical Consultant, UNSW Stats Central

Date: Thursday 19 August 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - July

Survey and Questionnaire Design: Practical advice for researchers

Data collection by surveys and questionnaires is an efficient and popular method. Researchers can quickly collect many observations, especially with online survey administration. But if items on the instrument aren’t crafted with care, data quality can be poor. This seminar will provide recommendations for survey and question design that will help ensure that your survey and questionnaire data help you answer your research questions.

The talk will be about 30 minutes long and following by discussion.

Date: Thursday 15 July 2021

Time: 2.30-3.30pm

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2021 - June

T-test a special case of regression? Strange but true …

There are so many statistical tests - t-test, ANOVA, ANCOVA, regression ... ! Which one should I use? Many tests are really just different versions of one method, a "linear model". We'll talk about why it makes life a little easier to think about them in this way.

The talk will be about 30 minutes long and following by discussion.

Date: Thursday 17 June 2021

Time: 2.30-3.30pm

Speaker: Peter Geelan-Small, Statistical Consultant, UNSW Stats Central

Location: Online

Slides for the presentation are HERE

2021 - April

It all depends: Interaction terms in regression

If the effect of an independent variable on the response variable depends on some other variable, then you are in interaction land. When does a study question call for interaction terms? What does a categorical variable interacting with a categorical variable mean? And how about two continuous variables that interact, and continuous with categorical variable interactions? How can I interpret my results if my interaction was significant/ not significant? We’ll look at lots of ways to plot out the results of a regression that has included an interaction between 2 (or more) variables.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Date: Thursday 29 April 2021

Time: 2.30 - 3.30pm

Location: Online

Slides for the presentation are HERE

2021 - March

A practical guide to meta-analysis

Systematic reviews and meta-analysis are appearing with increasing frequency in the literature. This seminar will discuss the steps needed to perform a meta-analysis as a guideline. The type of data, effect measures, heterogeneity, and publication bias in meta-analysis will be reviewed through examples, and illustrated with forest plots and funnel plots.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Zhixin Liu, Statistical Consultant, UNSW Stats Central

Date: Thursday 18 March 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2021 - February

Like Cats and Dogs – Why model selection and inference just can’t get along

A common problem facing researchers following data collection is that it is unclear which of the (possibly many) variables should be included in the analysis. While this process can be challenging in its own right, it raises another, often more problematic, issue. The resulting model can no longer be used for statistical inference to answer research questions. In this seminar, we will take a closer look at why that is the case and discuss possible ways out of the dilemma.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Peter Humburg, Statistical Consultant, UNSW Stats Central

Date: Thursday 18 February 2021

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - December

2020 in Graphs

This has been a year unlike any other, well, unless you view it in terms of its sporting outcomes (Melbourne and Richmond won… again!). It has been a year with plenty of hardship and tragedy, but also a year with plenty of opportunity and unexpected trends. In this talk the Stats Central team will channel Alan Kohler to collate a set of funky graphs that take a light-hearted, almost inappropriately upbeat, look at The Year That Was. This will also be an opportunity to explore different techniques for visualising data and presenting ideas in creative ways.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Prof David Warton, Director of UNSW Stats Central

Date: Thursday 10 December 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - November

When Your Research Meets a Global Pandemic

This year, just about everything we do as researchers has been affected by COVID-19. Distancing measures has made it difficult for some people to pursue their research. This talk will highlight some of the problems the consultants at Stats Central have encountered when helping people throughout 2020. I will discuss some of the issues that may – or may not! – lead to changes in your study.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Date: Thursday 19 November 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - October

Methods for ranking and quantifying the importance of predictor variables

We all love to rank and order things. A common question after regression is “how do I know which variable is affecting my response variable more?”. E.g. Do extremes of temperature matter more than the mean (for plant growth)? Does sociodemographic index matter more than high school grade (for graduate outcomes)?

We discuss the options for this question, which depend on the model type, and include partitioning R2 type methods, regression coefficients and model averaging.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Eve Slavich, Statistical Consultant, UNSW Stats Central

Date: Thursday 22 October 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - September

Visualise high dimensional data with the tourr R package

When we have many variables, we are told the only way to visualize them is two or maybe three at a time with scatterplots and boxplots. Turns out that is baloney. The tourr package in R uses a technique called projection pursuit which allows you to visualise datasets containing 5, 10, even 20 dimensions. It feels a bit like walking around your data, hence the name tour. Touring your data lets you explore clusters in high dimensions, look at variable importance, dependence between variables, and see outliers. If you know how to use R then touring your data is very simple.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Gordana Popovic, Statistical Consultant, UNSW Stats Central

Date: Thursday 17 September 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - August

Variable selection: too many variables? what next?

There are various techniques for finding the most parsimonious statistical regression model. Not all methods can be recommended without any qualification. This talk will look at a number of variable selection methods and suggest some general guidelines.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Peter Geelan-Small, Statistical Consultant, UNSW Stats Central

Date: Thursday 13 August 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - July

Residuals in linear models: more than just what’s left over

Researchers commonly use linear models in analysing their data. How do we know that a model we use is adequate and appropriate? Residuals, the difference between what we see in our data and what a model predicts, can be used to diagnose problems with our model and lead us to improving our analysis. This talk with provide a general overview of the types of residuals in a general context, including general and generalized linear models.

The talk will be about 30 minutes long and follow by discussion.

Speaker: Nancy Briggs, Senior Statistical Consultant, UNSW Stats Central

Date: Thursday 16 July 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - June

Effect size: p value is not enough, measures of magnitude matters!

With the recognition that p-value is not enough for a research inquiry, measures of magnitude in terms of effect size is important to report in the results. Besides, effect size is often needed in sample size calculation and meta analysis. In this talk, we will explain what is effect size, type of the effect size, and how we define and calculate it under different scenarios with examples.

The talk will be about 30 minutes long and follow by discussion

Speaker: Zhixin Liu, Statistical Consultant, UNSW Stats Central

Date: Thursday 11 June 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - May

Mind your Ps: How to use and interpret p-values

Various branches of science are experiencing a "reproducibility crisis", with the (mis)use of p-values being widely identified as a major factor. As a result, the American Statistical Association released an official statement containing six principles underlying the proper use and interpretation of p-values. We will discuss each of the six principles and provide some advice on applying them in your work.

The talk will be about 30 minutes long and follow by discussion

Speaker: Mark Donoghoe, Statistical Consultant, UNSW Stats Central

Date: Thursday 14 May 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - April

Now there are two of them! Why you shouldn't dichotomise your variables

In many fields, it is common practice to dichotomise continuous variables prior to analysis. It may seem like a good idea that makes your life easier, but did you know that your results may suffer? We will take a closer look at the effect that dichotomising variables may have on results and discuss alternatives.

The talk will be about 30 minutes long and follow by discussion

Speaker: Peter Humburg, Statistical Consultant, UNSW Stats Central

Date: Thursday 9 April 2020

Time: 2.30-3.30pm

Location: Online

Slides for the presentation are HERE

2020 - February

Paired Data

Paired data crops up in many contexts. When two measurements are made on the same experimental unit, the data values are paired - for example, pre- and post-activity with the same subjects; measurements from two litter mates; data from each hand of a person. Paired data is clearly not independent and this dependence must be accounted for in any statistical analysis. This talk will look at how to analyse some examples of paired data, including continuous and discrete outcome measures.

The talk will be about 30 minutes long and will be followed by time for discussion. We'll move on after our talk to Hacky Hour (Penny Lane Café, 3.00 pm to 4.00 pm) where people can get help on statistics and bioinformatics, high performance computing and other data-related things.

Speaker: Peter Geelan-Small, Statistical Consultant

Date and Time: Thursday, 13 February 2020, 2.30 pm to 3.00 pm

Location: Mathews Theatre C (K-D23-303) | Kensington Campus

Slides for the presentation are HERE.

2019 - November

2019 in Graphs

As we come towards the end of 2019, we will look back on the year that was – politics, trade wars, climate change and S25 – through the lens of data visualisation!

Speaker: Prof. David Warton, Director of Stats Central, David is a professor of statistics specialising at its interface with ecological & environmental statistics.

Date and Time: Thursday, 14th November 2019, 2.30 pm to 3.00 pm

Location: Central Lecture Block, Theatre 2, Kensington Campus, UNSW Sydney

Slides for the presentation are HERE.

2019 - October

The Odd Thing About Odds Ratios

The odds ratio is a commonly used effect size measure for binary responses, but it has attracted some criticism. This seminar will attempt to demystify odds ratios, discuss their advantages and disadvantages, and present possible alternatives.

Speaker: Mark Donoghoe, Statistical Consultant, Stats Central

Date and Time: Thursday, 10th October 2019, 2.30 pm to 3.00 pm

Location: Central Lecture Block, Theatre 2, Kensington Campus, UNSW Sydney

Slides for the presentation are HERE.

2019 - September

The wonderful geometry of regression models, conditional relationships and "controlling for" variables

Have you ever wondered how regression models can “control for” variables when assessing effects? This seminar will attempt to demystify this concept by explaining underlying geometry with pictures and demonstrations.

Speaker: Gordana Popovic, Statistical Consultant, Stats Central.

Date and Time: Thursday, 19th September 2019, 2.30 pm to 3.00 pm

Location: Mathews Theatre C, Kensington Campus (D23), UNSW Sydney

Slides for the presentation are HERE.

2019 - August

No difference does not imply equivalence: misuse of P values in equivalence/non-inferiority testing

There has been growing interest in studies to determine if new therapies have equivalent or non-inferior efficacies to standard therapy. These studies are called equivalence/noninferiority studies. This talk will describe the concepts and statistical methods involved in testing equivalence/non-inferiority, and its difference to superiority testing. We will demonstrate with examples the setup of specific margins and null hypotheses, the use and interpretation of confidence intervals as well as how to avoid the misuse of P values in equivalence/non-inferiority testing.

Speaker: Zhixin Liu, Statistical Consultant, Stats Central.

Date and Time: Thursday, 8th August 2019, 2.30 pm to 3.00 pm Please note new starting time for our seminars!

Location: Central Lecture Block, Theatre 5, Kensington Campus, UNSW Sydney

Slides for the presentation are HERE.

2019 - July

Dealing with missing data in your research

Missing data occurs in almost all research, even well-designed and controlled studies. Missing data can reduce power and result in a biased estimate of your effect of interest. In this talk, I will review the mechanisms that give rise to missing data. I will also discuss some of the strategies available to address missingness, such as value substitution and deletion and more advanced methods such as imputation and maximum likelihood.

The talk will be about 30 minutes long.

Speaker: Nancy Briggs, Senior Statistical Consultant and Manager, Stats Central.

Date and Time: Thursday, 11th July 2019, 2.30 pm to 3.00 pm Please note new starting time for our seminars!

Location: Central Lecture Block, Theatre 1, Kensington Campus, UNSW Sydney

Slides for the presentation are HERE.

2019 - June

Visualising data - making sure your graph is worth 1,000 words

How can you make a picture of your data to make its message clear? You need good graphs to analyse data well and communicate your results effectively. This talk focusses on principles of effective data visualisation. Graphing principles and different types of graphs will be demonstrated using the R statistical package, but the basic principles apply to making graphs in any software package.

The talk will be about 30 minutes long and followed by discussion.

Speaker: Peter Geelan-Small, Statistical Consultant, Stats Central.

Date and Time: Thursday, 13 June 2019, 3-4 pm

Location: Central Lecture Block (E19), Theatre 6, Kensington Campus UNSW Sydney

Slides for the presentation are HERE.

2019 - May

Three C's of causal consideration: confounding, collinearity and colliders

A useful talk about deciding on the inclusion/exclusion of variables based on certain causal relationships or large levels of correlation.

The talk will be about 30 minutes long and followed by discussion.

Speaker: Ben Maslen, Statistical Consultant. Ben works at the interface of statistics and ecology using a wide variety of statistical techniques and has particular expertise in models for multivariate abundance data.

Date and Time: Thursday, 23 May 2019, 3-4 pm

Location: Central Lecture Block (E19), Theatre 3, Kensington Campus UNSW Sydney

Slides for the presentation are HERE.

2019 - April

How do you deal with count data?

We will talk about techniques for dealing with data obtained when we count things and various properties that these types of data have. Some topics we will discuss are: mean-variance relationships, overdispersion and underdispersion, as well as a variety of models for data obtained from counts (such as, Poisson and negative binomial models and models for binomial successes).

The talk will be about 30 minutes long and followed by discussion.

Date and Time: Thursday, 11 April 2019, 3-4 pm

Location: Central Lecture Block (E19), Theatre 1, Kensington Campus UNSW Sydney

Slides for the presentation are HERE.

2019 - March

What can Data Science do for you?

Ever wonder how investment companies improve investment returns? What tools manufacturers use to improve their productivity? How e-commerce companies can increase their revenue? (Spoiler alert: they use Data Science!)

This is an introductory seminar on Data Science from a Computer Science perspective. Data science is a multi-disciplinary field that requires skills from mathematics, computer science and business. I will cover topics including:

The basics of Data Science
How to apply Data Science to your projects and
Requirements and challenges when using Data Science

The talk will be about 30 minutes long

Speaker: Raymond Wong, Stats Central and Associate Professor at the School of Computer Science and Engineering. Raymond's research interests include: big data management, XML and semi-structured data, data mining and analytics, mobile technologies and service computing

Date and Time: Tuesday, 12th March 2019, 3.00 to 4.00 pm

Location: NewSouth Global Theatre, Webster Building (G14), Room 127 | UNSW Kensington Campus

Slides for the presentation are HERE

2019 - February

Analysis of Pretest-Posttest Data: It’s not as straightforward as you might think!

Experimental designs comparing group differences in change over two time points are common in many areas of research. A pre-post design is a simple way to test the effect of an intervention on mean outcomes, but the statistical analysis does pose some questions for the researcher. This talk will outline some of the analysis options available to analyse two-group, pre-post data, including repeated measures analysis of variance, analysis of covariance, change scores and mixed models.

The talk will be about 30 minutes long and will be followed by (free!) afternoon tea and time for discussion.

Speaker: Nancy Briggs, Senior Statistical Consultant and Manager, Stats Central. Nancy's research interests include: the application of latent variable models, multilevel models and related models to problems in public health, medical research and behavioural sciences; Bayesian statistics; clinical trials.

Date and Time: Thursday, 14th February 2019, 3.00 to 4.00 pm

Location: Ainsworth Building (J17), Room G03 | UNSW Kensington Campus

Important! Please register by Thursday, 14th February, 10.00 am, so we can cater properly for afternoon tea!

Slides for the presentation are HERE