当前位置: 首页 > news >正文

Least squares prediction and Indicator Variables

Least squares prediction

image.png

CI and PI

  • CI: “Where is the average wage for people with 16 years of education likely to be?”

  • PI: “If I pick one random person with 16 years of education, what wage will they likely earn?”

Confidence Interval (CI)

  • Target: The conditional mean μY∣X0=E[Y∣X0]\mu_{Y|X_0} = \mathbb{E}[Y|X_0]μYX0=E[YX0]

  • This is a fixed but unknown number (not random once the data and X0X_0X0​ are fixed).

  • The randomness comes from the estimation process (sampling variation in β^\hat{\beta}β^​ ).

  • So the CI is telling you: “With 95% confidence, the true mean lies in this range.”

  • CI → fixed but unknown parameter (deterministic).

Prediction Interval (PI)

  • Target: A new observation Y0​∣X0​Y0​∣X0​Y0​X0​

  • Even if you knew the regression line exactly, this Y0Y_0Y0would still be random because of the error term ε\varepsilonε.

  • PI accounts for both the estimation uncertainty and the irreducible randomness.

  • So PI is telling you: “With 95% probability, the next individual value will fall in this range.”

  • PI → random variable (future realization of Y0Y_0Y0).

Summary

  • CI → fixed but unknown mean, interval comes from the randomness of the estimator.

  • PI → inherently random individual outcome, interval accounts for both estimation error and noise

Indicator Variables

dummy variables

They are also called dummy, binary, or dichotomous variables, because they take just two values, usually one or zero, to indicate the presence or absence of a characteristic or to indicate whether a condition is true or false. For variables that take on only two values, the mapping to {0, 1} is very common. However other numerical values can also be adopted.
For example, we can define an indicator variable

	D    • D = 1 if characteristic is present; • D = 0 if characteristic is not present.

Intercept indicator variables

image.png

The reference group and the dummy variable trap

image.png
image.png

Slope indicator variables

image.png

Interactions between qualitative factors

image.png
image.png

Why do we need an extra coefficient (γ\gammaγ) for the interaction between being Black and Female, instead of just letting the two dummy variables (Black, Female) both equal 1?

Two dummy variables only (no interaction term)

Suppose the regression is:

E(WAGE)=β1+β2 EDUC+δ1 BLACK+δ2 FEMALEE(WAGE) = \beta_1 + \beta_2 \, EDUC + \delta_1 \, BLACK + \delta_2 \, FEMALEE(WAGE)=β1+β2EDUC+δ1BLACK+δ2FEMALE

  • If someone is Black Female, then BLACK=1,FEMALE=1

  • Their expected wage would be:

    β1+β2EDUC+δ1+δ2\beta_1 + \beta_2 EDUC + \delta_1 + \delta_2β1+β2EDUC+δ1+δ2

So in this model, the effect of being Black and Female is just the sum of the two separate penalties.

With an interaction term

Now suppose the model is:

E(WAGE)=β1+β2 EDUC+δ1 BLACK+δ2 FEMALE+γ(BLACK×FEMALE)E(WAGE) = \beta_1 + \beta_2 \, EDUC + \delta_1 \, BLACK + \delta_2 \, FEMALE + \gamma (BLACK \times FEMALE)E(WAGE)=β1+β2EDUC+δ1BLACK+δ2FEMALE+γ(BLACK×FEMALE)

  • If someone is Black Female, then BLACK=1, FEMALE=1, so the interaction term = 1.

  • Their expected wage is:

    β1+β2EDUC+δ1+δ2+γ\beta_1 + \beta_2 EDUC + \delta_1 + \delta_2 + \gammaβ1+β2EDUC+δ1+δ2+γ

Here, γγγ is an extra adjustment. It allows the Black Female group’s outcome to be different from just the sum of “being Black” + “being Female.”

Here, γ\gammaγ is exactly the “extra piece” — the part of the joint Black–Female effect that cannot be explained by just adding the two main effects.

Why no perfect collinearity?

  • The interaction column is not equal to either BLACK or FEMALE.

  • And it’s also not equal to their sum or difference.

  • It is only “1” in one cell (Black Female), “0” elsewhere.

So mathematically:

BLACK×FEMALE≠a⋅BLACK+b⋅FEMALE+cBLACKBLACK×FEMALE≠a⋅BLACK+b⋅FEMALE+cBLACK BLACK×FEMALE=aBLACK+bFEMALE+cBLACK

for any constants a,b,c

Thus, the interaction term provides new independent information and does not cause perfect collinearity.

http://www.dtcms.com/a/399747.html

相关文章:

  • wordpress站群是什么网站官网建设企业
  • Qt(常用的对话框)
  • 网站被墙怎么做跳转360浏览器打开是2345网址导航
  • Qt QPainter 绘图系统精通指南
  • 宣城网站开发专业制西安巨久科技网站建设
  • LVGL详解
  • 饰品销售网站功能建设seo思维
  • 什么是UT测试
  • 制作网站需要的技术wordpress的xmlrpc
  • Playwright 高级用法全解析:从自动化到测试工程化的进阶指南
  • 视觉SLAM第14讲:现在与未来
  • 系统基模的思想
  • 专业的网站建设企业网站专做脚本的网站
  • 郑州市建设信息网站wordpress整合ucenter
  • 安徽网站开发项目wordpress 后台 重定向循环
  • XSD 文件(XML Schema Definition)简介
  • 什么网站可以做美食怎么做学校网站和微信公众号
  • 寒武纪MLU环境搭建并部署DeepSeek【MLU370-S4】
  • 永康物流网站泉州网站制作推广
  • Hackademic: RTB2靶场渗透
  • 第九届电气、机械与计算机工程国际学术会议(ICEMCE 2025)
  • SimForge™ 功能介绍|「组织管理」赋能仿真研发场景——权限可控、资源可调、成本可溯
  • 【读书笔记】《创始人》
  • 组件化思维(上):视图与基础内容组件的深度探索
  • 深入了解鸿蒙的Ark编译器:起源、历史、特点与学习指南
  • React Native:为什么带上version就会报错呢?
  • [RK3288][Android6.0] 调试笔记 --- 系统自带预置第三方APK方法
  • wordpress升级php7北京网站优化步
  • Multipath
  • Optuna v4.5新特性深度解析:GPSampler实现约束多目标优化