Valid Inference with Double Dipping – December 2024
Event Phone: 1-610-715-0115
Upcoming Dates
-
16DecValid Inference with Double Dipping12:00 PM-3:00 PM
Cancellation Policy: If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50).
In the unlikely event that Statistical Horizons LLC must cancel a seminar, we will do our best to inform you as soon as possible of the cancellation. You would then have the option of receiving a full refund of the seminar fee or a credit towards another seminar. In no event shall Statistical Horizons LLC be liable for any incidental or consequential damages that you may incur because of the cancellation.
A Distinguished Speaker Series Seminar by Daniela Witten, Ph.D.
Textbooks on statistical inference typically assume that the data analyst has chosen a hypothesis to test or a confidence interval to estimate before looking at the data—or, better yet, before they have even collected it! However, in reality, statistical practice often proceeds quite differently: an analyst may first explore the data in order to come up with a statistical question that seems “interesting” and then use the same data to answer that question.
We call this practice “double dipping.” Unfortunately, classical statistical machinery does not apply when we have double dipped: for instance, hypothesis tests will reject the null hypothesis far more often than they should, and confidence intervals will not cover the parameter of interest. This leads to spurious findings that will not hold up in future studies. In this course, we will talk about recent developments that enable valid inference with double dipping.
During the first hour, we will consider double dipping through the lens of multiple testing. We will show that in very simple settings, multiple testing corrections—many of which have been around for decades—may be suitable solutions to the double dipping problem. However, when the settings get more complicated (and more realistic) multiple testing corrections don’t cut it.
During the second hour, we will present the conditional selective inference framework, a relatively new approach to address double dipping, which circumvents the need for multiple testing corrections. In this framework, we use all of our data to identify an interesting question, and then we answer the question again using all of our data, but without re-using any of the (statistical) information that led us to identify the question.
Finally, during the third hour, we will consider approaches that involve splitting the data into a training set and a test set: the training set can be used to come up with an interesting question, and the test set can be used to answer it. The simplest such approach is sample splitting, which is a key tool in any data analyst’s toolbox.
But there are many situations in which sample splitting is either unappealing or inapplicable: for instance, if the sample size is very small, or if the observations are not independent and identically distributed. In such settings, data thinning provides an attractive alternative. Data thinning enables us to “split” even a single datapoint into two independent pieces, so that we can identify an interesting question on one piece and answer it on the other.
Who should attend: This course is intended for data scientists and statisticians who conduct statistical inference (e.g., hypothesis tests and confidence intervals) in the “real world” and want to update their statistical toolset to enable valid analysis when the target of inference is selected from the data.
Venue: Livestream Seminar