Does Spanking Hurt — Or Only Angry Spanking?

The oldest fight in parenting research: spanked kids do worse as adults, but spanking parents differ in a hundred other ways. This survey split the question in two — angry/uncontrolled/unpredictable spankings vs calm/controlled/predictable ones, each for ages 0–12 and 13–18 — and measured the loud, hostile household around them separately.

From Aella's childhood survey "Was Your Childhood Heaven or Hell?" (n=43,872; spanking battery n≈26,400) · analysis June 2026

TL;DRFour headlines

40%were angry-spanked at 0–12 (46% of women, 35% of men); 33% calm-spanked
−0.45 → −0.05SD: angry spanking's wellbeing penalty, raw → vs an equally loud/cold household
+0.13SD: calm-spanked adults are better off than unspanked, raw — because calm spankers are warmer than average
0.7ppof wellbeing variance is unique to all 8 spanking items, vs 26% for family environment

THE QUESTIONSWhat was actually asked

Every childhood item is rated twice (ages 0–12 and 13–18) on a 7-point agree scale (−3 = strongly disagree … +3 = strongly agree). Verbatim, from the survey source:

VariableQuestion ("during ages 0–12 / 13–18:")n
angryspank/b"Your parents used angry/uncontrolled/unpredictable spankings as a form of discipline" (tip: if they were calm/controlled/predictable, this does NOT count)26,398 / 26,378
calmspank/b"Your parents used calm/controlled/predictable spankings as a form of discipline" (tip: if they were angry/uncontrolled/unpredictable, this does NOT count)26,428 / 26,409
loud7/b"your parents physically disciplined you in an irregular, emotionally expressive way"39,387 / 39,366
etc6/b"your parents physically disciplined you" (general)37,566 / 37,552

Definitions used throughout. "Endorsed" = any agreement (>0). Adult wellbeing = mean of five z-scored adult items — depression, anxiety, suicide ideation, "I am not happy" (all reversed) and the quality-of-life composite — re-standardized, higher = better (n=42,096; mean inter-item r=.51). The survey's "at war with yourself" item exists only as a childhood question, so it is excluded from the adult composite. There is no relationship-with-parents-today question in this survey, so that outcome can't be tested. Family environment controls = (a) the aggro-household battery — loud/confrontational culture, adults yelled at you, verbal/emotional abuse of you, father→mother and mother→father verbal abuse, parents had a bad relationship, parents unpredictable — with the physical-discipline item deliberately excluded so the control doesn't swallow the exposure; (b) parental warmth (the 14-item goodparent battery: guidance, respect, unconditional love, honesty, humor, physical affection, apologizing). The angry/calm spanking battery was added late (survey version 315+), so those analyses run on ~25–26k respondents; per-figure n is printed on each chart.

WHO GOT SPANKEDPrevalence and the co-occurrence problem

Prevalence of each discipline type by sex

Physical discipline was the norm: 61% report at least some at 0–12. Angry spanking at 0–12 hits 40% (women 46%, men 35%); calm spanking 33% — with the sex pattern reversed (men 37%, women 29%). Either women's parents really did spank daughters more angrily and sons more calmly, or the same childhood is remembered differently by sex; this data can't separate those, and the same caution applies to the "irregular, emotionally expressive" item (women 52% vs men 39%). Everything drops sharply for ages 13–18: angry 22%, calm 10% — spanking teens is much rarer, and when it happens it's disproportionately the angry kind.

Co-occurrence matrix of the four spanking types

The matrix shows why "I was spanked calmly and I'm fine" and "spanking ruined people" talk past each other. Of those calm-spanked at 0–12, 41% were also angry-spanked at 0–12 — so a naive "ever calm-spanked" group is heavily contaminated with angry spanking. Teen spanking is almost never new: 94% of teen-angry-spanked were already angry-spanked as kids, 93% of teen-calm-spanked already calm-spanked. Still, the two styles are nearly orthogonal as traits (item correlation r=.09): plenty of homes had only one kind.

RAW OUTCOMESAngry spanking devastates; calm spanking… helps?

Raw adult wellbeing by mutually exclusive spanking group, by sex

Unadjusted, in mutually exclusive 0–12 groups: anyone with angry spanking in the mix lands far below the never-spanked — angry-only is −0.35 SD below never-spanked among women and −0.43 among men. But the calm-only group sits above the never-spanked in both sexes (+0.09 women, +0.12 men). Raw spanking-vs-not comparisons that don't split by style average these opposites together, which is how "any physical discipline" shows a moderate raw penalty of −0.24 SD.

Why would calm spanking look protective raw? Because of who does it: calm spanking is essentially uncorrelated with the aggro-household battery (r=−.05) and slightly positively correlated with parental warmth (r=+.10). Angry spanking is the opposite — r=.52 with household aggression, r=−.43 with warmth. "Was your spanking angry or calm" is mostly a question about what kind of family you had.

The large male–female gap in every bar (men report higher wellbeing throughout) is why every analysis below controls or splits by sex; pooled bars would mostly measure group sex-composition.

THE CENTERPIECESpanked, vs equally-chaotic-but-not-spanked

The trick: instead of comparing spanked kids to everyone else, compare them to people from equally loud, hostile, unpredictable, equally warm-or-cold households (plus sex and age) who weren't spanked. Whatever survives that comparison is the part you can still try to pin on the spanking itself.

Forest plot: raw vs family-environment-adjusted effects of each discipline type

What survives: almost nothing.

Entering all four spanking types simultaneously with the environment controls (n=24,807) tells the same story with one wrinkle: angry 0–12 keeps −0.07 ± 0.03, calm 0–12 −0.03 ± 0.03, while both teen terms flip trivially positive (+0.05 ± 0.03 and +0.04 ± 0.04) — classic suppression residue once childhood spanking and household are held, not evidence teen spanking helps. For scale: in that same model, household aggression carries −0.20 and warmth +0.25.

What "adjusted" cannot do. The environment battery is itself retrospective self-report, so part of the adjustment may be over-control (spanking is one way a household is hostile) and part under-control (the battery measures the household imperfectly, leaving residual confounding inside the −0.05). These two biases run in opposite directions; the honest claim is "the spanking-specific signal is somewhere near zero, bounded by a small negative," not an exact number.

SAME HOMEWarm homes that spank calmly, chaotic homes that spank angrily

A more concrete version of the same question, in plain group means. Take only warm homes (top tercile of the warmth battery): compare calm-only-spanked kids (calm yes, angry no) to kids never physically disciplined at all (no angry, no calm, no general physical discipline). Then take chaotic homes (top tercile of the aggro battery): compare angry-spanked to never-physically-disciplined.

Wellbeing in warm homes with calm-only spanking vs none, and chaotic homes with angry spanking vs none, by sex

Warm homes: calm-only-spanked kids are statistically indistinguishable from unspanked kids — Δ = +0.02 ± 0.08 SD (women), +0.07 ± 0.06 (men). If calm spanking by otherwise-warm parents leaves a scar on adult wellbeing, it is smaller than this design can see at n≈6,200. Chaotic homes: angry spanking does add measurable harm beyond the chaos — Δ = −0.19 ± 0.06 (women), −0.12 ± 0.09 (men). Consistent with the forest plot: the only place a spanking-specific penalty shows up is angry spanking, and it's a tenth-or-two of an SD, not the half-SD the raw numbers suggest.

Caveat for the warm-home cell: "warm home + angry spanking" exists too (warmth and angry spanking are negatively but not perfectly correlated), and within warm homes the angry-spanked also sit slightly lower. The clean reading: style matters more than act — and even style is mostly a proxy for everything else the parents did.

DOSE-RESPONSEGradients, and whether teen spanking is different

The items are agreement scales, not frequency counts — but agreement intensity behaves like a dose here (a "strongly agree" on angry spanking is a stronger exposure claim than "slightly agree").

Wellbeing by agreement level with angry and calm spanking, ages 0-12 and 13-18

Angry spanking shows a clean monotone slide on the endorsement side: each step from "slightly" to "strongly agree" costs more wellbeing, ending −0.55 SD below the sample mean at strong agreement (0–12), with the teen curve nearly as steep. Calm spanking's line is comparatively flat on the endorsement side — strong-agree calm spanking sits about where slight-agree does. Standardized: a 1-SD increase in angry-spanking agreement predicts −0.25 SD wellbeing raw but only −0.03 adjusted; calm +0.04 raw, −0.02 adjusted; teen angry −0.19 raw, −0.00 adjusted. So the teen "signature" is: identical confounded signal, even less surviving adjustment than the 0–12 version. The dip at the disagree-end (people who answer −1/0 do worse than firm −3 responders) appears for both items and likely mixes mild exposure with unsure/ambivalent responding.

R² HONESTYHow much information is actually in the spanking items?

R-squared of spanking block vs environment block vs both

On the same complete-case sample (n=24,807): all 8 physical-discipline items together explain 9.9% of adult-wellbeing variance. The 28 environment items (aggro battery + warmth, both age bands) explain 26.0%. Both together: 26.7%. So of spanking's 9.9 points, 9.3 are shared with the environment measures and 0.7 are unique; environment keeps 16.8 unique points. Stated plainly: once you know how loud, hostile, unpredictable, and warm the household was, asking about spanking improves your prediction of adult wellbeing by less than one percentage point of variance — and the reverse is not true.

INSIDE THE HOUSEHOLD BATTERYWhich items do the absorbing?

Everything above controlled for the household using two composite scores — the aggro battery and the warmth battery, which were built as public-facing constructs for the survey's results page. Composites can hide their own action: "the household absorbs the spanking effect" doesn't say which facts about the household. So here the composites are unbundled: all four spanking types plus every individual battery item (each item = its 0–12 and 13–18 ratings averaged, per SD) enter one model together, with sex and age. The duplicate item is dropped (the survey engine copies "parents had a bad relationship" into the aggro battery; it's used once), and "parents were predictable" is reversed to read as unpredictability.

Forest plot: four spanking types and all 14 individual household items in one model

The damage is carried by a specific, short list:

Joint betas split shared variance: items within a battery correlate strongly, so "loud culture carries nothing" means no independent signal beyond its siblings, not that loud homes are fine. Two warmth items (joking/goofing, physical affection) even flip trivially negative in the joint model (−0.03, −0.02) despite solid positive solo effects (+0.14, +0.19) — classic collinearity residue, same flavor as the teen-spanking flips. The solo view below is the fair per-item comparison.
Sorted bar chart of each battery item's solo effect on adult wellbeing, sex and age controlled

Item by item (each alone, sex+age controlled), the aggro battery spans a 2× internal range: from −0.19 SD per SD (mother→father verbal abuse) up to −0.37 (verbal/emotional abuse of you), with yelling at −0.29, a bad parental relationship −0.27, unpredictability −0.23. For comparison, the "irregular, emotionally expressive physical discipline" exposure item sits mid-pack at −0.26. The warmth items run +0.14 to +0.37, topped by unconditional love — solo, the single most predictive item in either battery. The batteries are not seven copies of one number, and the items that matter most are the relational ones (abuse aimed at you; love you could count on), not the structural ones (volume, predictability).

Does the residual −0.05 hide in unloved homes?

There's no "parents were proud of me" item in this survey; the two strongest warmth items are "you felt unconditionally loved" and "at least one parent respected you", so those carry the interaction test (angry spanking 0–12 × item, on top of the full control set). The naive linear interaction comes out significantly negative for both (−0.05 ± 0.02) — which would read "angry spanking hurts more in loving homes" — but it's an artifact: the item→wellbeing curve is convex, and allowing a quadratic item term nulls the interaction completely (unconditional love: +0.00, p=.88; respect: −0.01, p=.25). The honest readout is the stratified one: the adjusted angry-spanking penalty is −0.09 among people who strongly deny the item (bottom group on either love or respect) versus −0.03 to −0.05 everywhere else. Suggestive of concentration at the very bottom — angry spanking with no felt love behind it — but with overlapping CIs (±0.04–0.06), it's a hint, not a finding.

Do items beat the composites overall?

Same complete-case sample (n=24,807): the report's composite control set (aggro + warmth + sex + age) explains 29.3% of wellbeing variance; the 14 items + sex + age explain 31.1% (+1.8pp; 31.8% if every age band enters separately). Adding the four spanking terms on top of the items moves it by +0.06pp. So unbundling buys slightly sharper measurement — enough to halve the already-tiny angry residual — and spanking remains a rounding error either way.

CAVEATSHow to read this

Retrospective recall. Adults rated their childhood and their current state in the same sitting. Currently-depressed people may remember harsher discipline and a colder home (and "angry vs calm" is itself a judgment call made in retrospect). This inflates raw associations and can distort adjusted ones in either direction.

Self-selected, very-online sample — young (mean age 27), left-leaning, heavily LGBT-enriched. Absolute prevalences (the 40%, the 61%) describe this sample, not the population; the comparisons and orderings are the robust part. The spanking battery exists only for respondents from survey version 315 on (n≈26.4k of 43.9k).

Cross-sectional, no causal claims. "Survives adjustment" means "predicts beyond measured household variables," not "causes."

Genetic and unmeasured confounding. Parents who spank in rage differ from other parents in many ways — including temperament their children partly inherit. Twin and adoption designs typically shrink discipline effects further; nothing here controls genes, neighborhood, school, or anything outside the home batteries. The honest bound on the spanking-specific effect is therefore "−0.05 SD or smaller."

One outcome family. Adult wellbeing (mood, suicidality, QoL) is the target here. Spanking could matter for outcomes not modeled (e.g., the parent-child relationship today — not asked in this survey).