当前位置：首页 > news >正文

[Python 数据科学] Python 的良好编程规范与 logging 的使用

news 2025/10/15 13:41:04

Python 数据科学

文章目录

Python 的良好编程规范
- 代码布局
- - 空格与空行的使用
  - 代码行长度、换行与模块导入
- 命名规范
- 注释与文档字符串（Docstring）
- 其他一些推荐
- - 使用 f-string 进行字符串格式化
  - 使用 with 语句管理资源
  - 使用列表推导式提高代码效率
  - 使用`enumerate()`遍历列表
  - 使用`zip()`遍历多个列表
  - 使用`__name__ == "__main__"` 保护代码
- 开发大型程序时使用`logging`代替 print()
- - - 支持不同级别的日志
  - 日志可以保存到文件
  - 日志格式可自定义
  - `logging` 可以用于多模块
  - `logging` 适用于多线程

Python 的良好编程规范

在 Python 编程中，良好的编码习惯可以提高代码的可读性、可维护性和运行效率。Python 官方提供了 PEP 8 作为代码风格指南，本章根据该指南以及经验心得总结了下面几点。

代码布局

空格与空行的使用

使用 4 个空格缩进（避免使用 Tab）

因为不同的编辑器和环境可能对 Tab 的显示方式不同（如 2、4、8 个空格）。使用 4 个空格可以确保代码在所有编辑器、终端和操作系统上保持一致的外观。

适当添加空行
- 类与类外部的函数使用两个空行分割
- 类内部的函数之间使用一个空行分割
- import 代码块一般与程序的主代码块空一行或两行
- 可以（谨慎地）使用额外的空行来分隔相关函数以及函数内部中的不同代码块

import os
import sys 


# two blank lines above
class MyClass:
    
    def foo1(self):
        pass

    # functions inside a class are separated by one blank line
    def foo2(self):
        pass
        

# class and the outer function is separated by two blank lines
def main():
    my_class = MyClass()
    my_class.foo1()

if __name__ == "__main__":
    main()

适当使用空格
- 括号紧后面避免使用空格
- 在逗号和随后的右括号之间使用一个空格
- 紧接着逗号或冒号之后使用一个空格
- 但是，在切片中，冒号的作用类似于二元运算符，并且两边的空格数应相等（将其视为优先级最低的运算符），双冒号放在一起时中间的空格应省略。
- 对于一些二元计算符：=, +=, ‘-=’, <, ==, >, ‘>=’, ‘<=’, ‘!=’，在两边分别留一个空格

# Correct:
spam(ham[1], {eggs: 2})

# Wrong:
spam( ham[ 1 ], { eggs: 2 } )

# Correct:
foo = (0,)

# Wrong:
bar = (0, )

# Correct:
if x == 4: 
    print(x, y)

# Wrong:
if x == 4 : 
    print(x , y)

# Correct:
ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
ham[lower:upper], ham[lower:upper:], ham[lower::step]
ham[lower+offset : upper+offset]
ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)]
ham[lower + offset : upper + offset]

# Wrong:
ham[lower + offset:upper + offset]
ham[1: 9], ham[1 :9], ham[1:9 :3]
ham[lower : : step]
ham[ : upper]

如果使用具有不同优先级的运算符，请考虑在优先级最低的运算符周围添加空格。请自行判断；但是，切勿使用超过一个空格，并且二元运算符两侧的空格量应始终相同：
避免使用行尾空格

# Correct:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

# Wrong:
i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)

代码行长度、换行与模块导入

每行代码长度不超过 79 个字符
应该在二元运算符之前换行，而不是之后

# Wrong:
# operators sit far away from their operands
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)

# Correct:
# easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

单独导入通常应该放在单独的行，或者使用 from 导入多个 module

# Correct:
import os
import sys

# Wrong:
import sys, os

# Correct:
from subprocess import Popen, PIPE

导入总是放在文件的顶部，紧接着任何模块注释和文档字符串，以及模块全局变量和常量之前。
导入应按以下顺序分组：
- 标准库导入。
- 相关第三方进口。
- 本地应用程序/库特定的导入。
在每组导入之间放置一个空行。

# Standard library imports
import os
import sys
import datetime

# Related third-party imports
import numpy as np
import pandas as pd
import requests

# Local application/library-specific imports
from my_project.utils import helper_function
from my_project.models import UserModel
from my_project.config import settings

命名规范

避免使用 python 的保留关键字命名
变量使用小写+下划线（snake_case）

user_age = 25
max_length = 100

常量通常在模块级别定义，全部用大写字母书写，并用下划线分隔单词

MAX_LIMIT = 500

类名使用大写开头的驼峰命名（PascalCase）

class UserProfile:
    pass

模块名称应简短，且全部小写，可以在模块名称中使用下划线提高可读性。Python 包名称也应简短，且全部小写，但不鼓励使用下划线。
如果函数参数的名称与保留关键字或函数外变量名字冲突，通常最好在末尾添加一个下划线

name = "Global Variable"

def greet(name_):
    """Function to greet a user, avoiding causing confusion with the outer variable."""
    print(f"Hello, {name_}!")

greet("Alice")

使用有意义的变量名，提高代码可读性

# 不推荐
x = 10
y = 20

# 推荐
width = 10
height = 20

使用 _作为临时变量（不需要使用的变量）

for _ in range(5):
    print("Hello")

长变量名尽量避免缩写，除非是常见缩写

# 不推荐
usr_nm = "Alice"

# 推荐
user_name = "Alice"

始终用 self 作为实例方法的第一个参数, 始终用 cls 作为类方法的第一个参数。

注释与文档字符串（Docstring）

注释应为完整的句子。第一个单词应大写，除非它是一个以小写字母开头的标识符（变量、类、关键字等）的名字。
行内注释应与语句至少相隔两个空格，应以 # 和一个空格开头，可以省略句号
块注释通常由一个或多个由完整句子构成的段落组成，每个句子以句号结尾。

# This function calculates the area of a circle.
# It takes the radius as input and returns the computed area.
def calculate_area(radius):
    """Compute the area of a circle using the formula πr²."""
    PI = 3.14159  # Constant value of π, approximated to five decimal places
    
    area = PI * radius * radius  # Calculate the area using the formula
    return area  # Return the computed area.


# The main execution block starts here.
if __name__ == "__main__":
    radius = 5  # Define the radius of the circle
    
    area = calculate_area(radius)  # Call the function to compute the area
    
    print(f"The area of the circle with radius {radius} is {area:.2f}")  # Display the result



- 为所有公共模块、函数、类和方法编写文档字符串。非公共方法不需要文档字符串，但应该添加注释来描述该方法的作用
- 对于多行文档字符串，""" 一般单独占一行


```python
def add(a: int, b: int) -> int:
    """
    返回两个数的和。

    Args:
        a (int): 第一个数
        b (int): 第二个数

    return:
        int: 两数之和
    """
    return a + b

对于单行文档字符串，请保持 “”" 的开始和结尾在同一行

"""Return an ex-parrot."""

其他一些推荐

使用 f-string 进行字符串格式化

相比 % 或 .format()，f-string 更直观、性能更好：

name = "Alice"
age = 25
print(f"My name is {name} and I am {age} years old.")

My name is Alice and I am 25 years old.

a = 1
b = 4.56
# 第一个数保留 3 位小数，浮点型，第二个数右对齐，宽度 20，保留 2 位小数，百分数显示
f'the numbers are {a:.3f} and {b = :>20.2%}'

'the numbers are 1.000 and b =              456.00%'

使用 with 语句管理资源

当操作文件，一些并行计算操作、并行计算或连接数据库时，使用 with 语句，避免手动 close()。

# 不需要手动添加 file.close()
with open("file.txt", "r") as file:
    content = file.read()

from multiprocessing import Pool
import os

def worker(n):
    return f"Process {os.getpid()} computed {n * n}"

if __name__ == '__main__':
    # 使用 with 语句管理进程池
    with Pool(processes = 4) as pool:
        results = pool.map(worker, range(5))

    # 进程池自动关闭，不需要手动 pool.close() 和 pool.join()
    for res in results:
        print(res)

使用列表推导式提高代码效率

列表可以直接在 [] 中使用 for 语句赋值。

# 不推荐
squares = []
for i in range(10):
    squares.append(i**2)

# 推荐
squares = [i**2 for i in range(10)]

使用`enumerate()`遍历列表

为了减少代码以及提高代码的可读性，用enumerate()遍历替代 range(len(list)) 遍历列表

names = ['lily', 'tom']
# 不推荐
for i in range(len(names)):
    print(i, names[i])

# 推荐
for i, name in enumerate(names):
    print(i, name)

0 lily
1 tom
0 lily
1 tom

但是，若要获取索引修改列表中的元素，仍然需要 range(len(list))。

names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]

for name, age in zip(names, ages):
    print(f"{name} is {age} years old.")

Alice is 25 years old.
Bob is 30 years old.
Charlie is 35 years old.

使用`zip()`遍历多个列表

names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]

# zip() 是 Python 内置函数，它用于并行遍历多个可迭代对象，将其元素逐一配对，返回一个迭代器
for name, age in zip(names, ages):
    print(f"{name} is {age} years old.")

Alice is 25 years old.
Bob is 30 years old.
Charlie is 35 years old.

与 enumerate 相似的是，若要修改原始列表中的元素，zip 就不适用了。

使用`name == "main"` 保护代码

在 Python 中，if name == “main”: 主要用于防止某些代码在导入时被执行，尤其是在多进程 (multiprocessing)、脚本执行、模块重用等场景下非常重要。

下面这段多进程的代码，如果没有 if name == “main”: 语句，则会出错。

from multiprocessing import Pool
import os

def worker(n):
    return f"Process {os.getpid()} computed {n * n}"

if __name__ == '__main__':
    # 使用 with 语句管理进程池
    with Pool(processes = 4) as pool:
        results = pool.map(worker, range(5))

    # 进程池自动关闭，不需要手动 pool.close() 和 pool.join()
    for res in results:
        print(res)

开发大型程序时使用`logging`代替 print()

Python 的 logging 模块是一个强大且灵活的日志系统，相比 print()，它能更好地管理日志，尤其适用于生产环境、调试、错误追踪、多线程等场景。

在小型脚本中 print() 可能够用，但在生产环境或大项目中，logging 是更好的选择。

printnt()只能输出到控制台，而 logging 可以：

保存日志到文件
设置不同的日志级别（INFO, ERROR, DEBUG 等）
灵活配置输出格式（时间、文件、行号等）
支持多线程/多进程

import logging

logging.basicConfig(level = logging.INFO)
logging.info("This is an info message")

INFO:root:This is an info message

支持不同级别的日志

logging 提供 5 种日志级别：

级别	方法	用途
`DEBUG`	`logging.debug()`	详细调试信息
`INFO`	`logging.info()`	普通信息
`WARNING`	`logging.warning()`	警告（默认级别）
`ERROR`	`logging.error()`	错误信息
`CRITICAL`	`logging.critical()`	严重错误

import logging

logging.basicConfig(level = logging.DEBUG)

logging.debug("Debugging info")
logging.info("General info")
logging.warning("Warning message")
logging.error("Error message")
logging.critical("Critical issue")

INFO:root:General info
WARNING:root:Warning message
ERROR:root:Error message
CRITICAL:root:Critical issue

日志可以保存到文件

在 print() 中，所有信息只能输出到控制台，一旦关闭终端，信息就丢失了。而 logging 可以写入文件，方便后续分析。

import logging

logging.basicConfig(filename = "app.log", level = logging.INFO)

logging.info("This is an info message")
logging.error("This is an error message")

日志会被写入 app.log 文件。

日志格式可自定义

import logging

logging.basicConfig(
    format = "%(asctime)s - %(levelname)s - %(message)s",
    level =logging.INFO
)

logging.info("Application started")
logging.error("Something went wrong")

输出类似于:

2025-02-14 12:00:01,234 - INFO - Application started
2025-02-14 12:00:01,235 - ERROR - Something went wrong

日志格式字段说明：

字段	说明
`%(asctime)s`	时间
`%(levelname)s`	日志级别
`%(message)s`	日志内容
`%(filename)s`	记录日志的文件
`%(lineno)d`	行号

`logging` 可以用于多模块

在大型项目中，多个 Python 文件可能需要共享日志系统，print() 无法做到这一点，而 logging 可以。

import logging
import module

logging.basicConfig(level=logging.INFO)
logging.info("Starting app...")
module.some_function()

📌 文件 module.py

import logging

def some_function():
    logging.info("This is from module.py")

运行 python app.py 后：

INFO:root:Starting app...
INFO:root:This is from module.py

多个模块共享同一个 logging 配置，比 print() 更方便管理。

`logging` 适用于多线程

print() 在多线程环境下可能会导致日志混乱，但 logging 是线程安全的。

import logging
import threading

logging.basicConfig(level = logging.INFO, format = "%(threadName)s: %(message)s")

def worker():
    logging.info("Worker thread executing")

if __name__ == '__main__':
    threads = []
    for _ in range(5):
        t = threading.Thread(target=worker)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()

输出:

Thread-1: Worker thread executing
Thread-2: Worker thread executing
Thread-3: Worker thread executing
Thread-4: Worker thread executing
Thread-5: Worker thread executing

日志不会相互覆盖，也不会丢失。

查看全文

http://www.dtcms.com/a/15893.html