This is an interactive visualization companion to the Decisions Matter - Controlling Outliers blog post (link tbd) and is designed to illustrate the tradeoffs associated with different commonly used outlier control methodologies as applied to agricultural household survey data. We provide three options for identifying outliers along with three additional options for modifying or removing them and box plots and visualizations to compare the results. On the first tab, we show side-by-side comparisons of each identification technique given the same replacement strategy, and on the second, we show the impact of a selected technique and replacement strategy on subgroups (either gender of household head or plot manager).
The three outlier detection techniques are percentile (assume any observation above or below selected cutoff percentiles is an outlier), MAD (assume any observation whose difference from the median is greater than the median deviation multiplied by a given factor is an outlier), and transformation: first apply a log transformation (we use the Yeo-Johnson technique, which uses the log for large values and is linear at small values), then classify any value above a cutoff z-score as an outlier.
The repacement options are to replace at the threshold (“Tails”), replace at the median (“Median”), or remove the observation from the sample (“Trim”).
This app is also designed to illustrate the effect of outlier control the components of a calculated value. Each of the indicators available is calculated as a ratio of two values in the surveys. Outlier control can be applied to any combination of the numerator, denominator, and final value. Experiment with different methods to see how the final estimates change.
University of Washington, Evans Policy Research and Analysis (2025). . URL: URL
We welcome feedback and questions. Please email us at uw.eparx@gmail.com.