当前位置: 首页 > news >正文

MongoDB源码分析慢日志:从配置到实现的完整解析

一、MongoDB 慢日志基础配置:Profiling 操作指南

mongodb对慢日志怎么做的呢?一般教程会这样指导:

1.1 查看当前 Profiling 状态

> use admin
switched to db admin
> db.getProfilingStatus();
{ "was" : 0, "slowms" : 100, "sampleRate" : 1 }

1.2 MongoDB Profiling 核心级别(0/1/2 级)说明

MongoDB 的 Profiling 有was三个级别:0:关闭 Profiling(默认);1:记录执行时间超过 slowms 阈值的操作;2:记录所有操作(谨慎使用,可能产生大量日志)。

1.3 启动参数配置 Profiling

启动脚本增加--slowms 50 --profile 1参数

>mongod --dbpath C:\me\studyhome\mongodb\mongo-r4.0.7\data -vvvvv --slowms 50 --profile 1

1.4 Profiling 配置后的验证方法

再次验证 Profiling 级别

二、MongoDB 慢日志底层实现机制拆解

2.1 会话状态流转与命令传递流程

mongo/transport/service_state_machine.cpp已经分析startSession创建ASIOSession过程,并且验证connection是否超过限制ASIOSession和connection是循环接受客户端命令,状态转变流程是:State::Created 》 State::Source 》State::SourceWait 》 State::Process 》 State::SinkWait  》 State::Source 》State::SourceWait。

State::Process将命令发送给mongodb数据库,数据库执行命令find。

2.2 handleRequest 核心方法执行逻辑(service_entry_point_common.cpp)

session接受到客户端发送的命令Msg,把接受到的Msg信息发送给mongodb数据库执行DbResponse dbresponse = ServiceEntryPointMongod->handleRequest()

/mongo/db/service_entry_point_common.cpp中的handleRequest方法代码:

DbResponse ServiceEntryPointCommon::handleRequest(OperationContext* opCtx,const Message& m,const Hooks& behaviors) {...CurOp& currentOp = *CurOp::get(opCtx);{stdx::lock_guard<Client> lk(*opCtx->getClient());// Commands handling code will reset this if the operation is a command// which is logically a basic CRUD operation like query, insert, etc.currentOp.setNetworkOp_inlock(op);currentOp.setLogicalOp_inlock(networkOpToLogicalOp(op));}OpDebug& debug = currentOp.debug();boost::optional<long long> slowMsOverride;bool forceLog = false;DbResponse dbresponse;if (op == dbMsg || (op == dbQuery && isCommand)) {dbresponse = receivedCommands(opCtx, m, behaviors);} else if (op == dbQuery) {invariant(!isCommand);opCtx->markKillOnClientDisconnect();dbresponse = receivedQuery(opCtx, nsString, c, m, behaviors);} else if (op == dbGetMore) {dbresponse = receivedGetMore(opCtx, m, currentOp, &forceLog);} else {...if (LastError::get(opCtx->getClient()).hadNotMasterError()) {notMasterLegacyUnackWrites.increment();uasserted(ErrorCodes::NotMaster,str::stream()<< "Not-master error while processing '" << networkOpToString(op)<< "' operation  on '" << nsString << "' namespace via legacy "<< "fire-and-forget command execution.");}}const bool shouldSample = currentOp.completeAndLogOperation(opCtx, MONGO_LOG_DEFAULT_COMPONENT, dbresponse.response.size(), slowMsOverride, forceLog);Top::get(opCtx->getServiceContext()).incrementGlobalLatencyStats(opCtx,durationCount<Microseconds>(currentOp.elapsedTimeExcludingPauses()),currentOp.getReadWriteType());if (currentOp.shouldDBProfile(shouldSample)) {// Performance profiling is onif (opCtx->lockState()->isReadLocked()) {LOG(1) << "note: not profiling because recursive read lock";} else if (c.isInDirectClient()) {LOG(1) << "note: not profiling because we are in DBDirectClient";} else if (behaviors.lockedForWriting()) {// TODO SERVER-26825: Fix race condition where fsyncLock is acquired post// lockedForWriting() call but prior to profile collection lock acquisition.LOG(1) << "note: not profiling because doing fsync+lock";} else if (storageGlobalParams.readOnly) {LOG(1) << "note: not profiling because server is read-only";} else {invariant(!opCtx->lockState()->inAWriteUnitOfWork());profile(opCtx, op);}}recordCurOpMetrics(opCtx);return dbresponse;
}

CurOp& currentOp = *CurOp::get(opCtx);记录开始时间,获取当前操作对应curop ;

dbresponse = receivedCommands(opCtx, m, behaviors);执行请求命令

const bool shouldSample = currentOp.completeAndLogOperation(opCtx, MONGO_LOG_DEFAULT_COMPONENT, dbresponse.response.size(), slowMsOverride, forceLog);处理时间差,根据参数判断是否需要记录日志。

profile(opCtx, op)将慢日志记录到db.system.profile中。

2.3 completeAndLogOperation 方法:耗时计算与日志判断

completeAndLogOperation核心方法如下:

bool CurOp::completeAndLogOperation(OperationContext* opCtx,logger::LogComponent component,boost::optional<size_t> responseLength,boost::optional<long long> slowMsOverride,bool forceLog) {// Log the operation if it is eligible according to the current slowMS and sampleRate settings.const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);LOG(1) << "conca responseLength=" <<responseLength;LOG(1) << "conca serverGlobalParams.slowMS=" <<serverGlobalParams.slowMS;LOG(1) << "conca slowMsOverride=" <<slowMsOverride;const auto client = opCtx->getClient();// Record the size of the response returned to the client, if applicable.if (responseLength) {_debug.responseLength = *responseLength;}// Obtain the total execution time of this operation._end = curTimeMicros64();_debug.executionTimeMicros = durationCount<Microseconds>(elapsedTimeExcludingPauses());LOG(1) << "conca _debug.executionTimeMicros=" <<_debug.executionTimeMicros;const bool shouldSample =client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;LOG(1) << "conca serverGlobalParams.sampleRate=" <<serverGlobalParams.sampleRate;LOG(1) << "conca shouldSample=" <<shouldSample;if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);if (_debug.storageStats == nullptr && opCtx->lockState()->wasGlobalLockTaken() &&opCtx->getServiceContext()->getStorageEngine()) {、.try {Lock::GlobalLock lk(opCtx,MODE_IS,Date_t::now() + Milliseconds(500),Lock::InterruptBehavior::kLeaveUnlocked);if (lk.isLocked()) {_debug.storageStats = opCtx->recoveryUnit()->getOperationStatistics();} else {warning(component) << "Unable to gather storage statistics for a slow ""operation due to lock aquire timeout";}} catch (const ExceptionForCat<ErrorCategory::Interruption>&) {warning(component) << "Unable to gather storage statistics for a slow ""operation due to interrupt";}}// Gets the time spent blocked on prepare conflicts.auto prepareConflictDurationMicros =PrepareConflictTracker::get(opCtx).getPrepareConflictDuration();_debug.prepareConflictDurationMillis =duration_cast<Milliseconds>(prepareConflictDurationMicros);log(component) << _debug.report(opCtx, (lockerInfo ? &lockerInfo->stats : nullptr));}// Return 'true' if this operation should also be added to the profiler.return shouldDBProfile(shouldSample);
}bool shouldDBProfile(bool shouldSample = true) {// Profile level 2 should override any sample rate or slowms settings.if (_dbprofile >= 2)return true;if (!shouldSample || _dbprofile <= 0)return false;return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};}

serverGlobalParams.slowMS默认值是100,前面mongod启动脚本设置--slowms 50

 serverGlobalParams.sampleRate默认值是1.0

2.4 shouldDBProfile:慢日志记录的核心判断逻辑

shouldDBProfile判断是否记录慢日志,profile>=2记录,profile=0不记录,profile=1 & slowms判断是否记录慢日志,前面mongod启动脚本设置--profile 1,所以这个find命令耗时超过50ms,记录到慢日志中。

/mongo/db/server_options.h源代码:

    int defaultProfile = 0;                // --profileint slowMS = 100;                      // --time in ms that is "slow"double sampleRate = 1.0;               // --samplerate rate at which to sample slow queriesint defaultLocalThresholdMillis = 15;  // --localThreshold in ms to consider a node localbool moveParanoia = false;             // for move chunk paranoia

三、MongoDB 慢日志运行验证与结果分析

运行过程关键日志打印结果:

2025-05-14T21:18:29.292+0800 D1 -        [conn1] conca responseLength=240
2025-05-14T21:18:29.292+0800 D1 -        [conn1] conca serverGlobalParams.slowMS=50
2025-05-14T21:18:29.292+0800 D1 -        [conn1] conca slowMsOverride=(nothing)
2025-05-14T21:18:29.292+0800 D1 -        [conn1] conca _debug.executionTimeMicros=56413
2025-05-14T21:18:29.293+0800 D1 -        [conn1] conca serverGlobalParams.sampleRate=1
2025-05-14T21:18:29.293+0800 D1 -        [conn1] conca shouldSample=1
2025-05-14T21:18:29.293+0800 I  COMMAND  [conn1] command db.user appName: "MongoDB Shell" command: find { find: "user", filter: { age: 28.0 }, sort: { wages: 1.0 }, lsid: { id: UUID("7f72749d-bdaf-4b1c-9ad4-b15ab44b055c") }, $db: "db" } planSummary: IXSCAN { age: 1 } keysExamined:2 docsExamined:2 hasSortStage:1 fromMultiPlanner:1 cursorExhausted:1 numYields:3 nreturned:2 queryHash:A13CF6D8 planCacheKey:FD90BAA0 reslen:240 locks:{ ReplicationStateTransition: { acquireCount: { w: 4 } }, Global: { acquireCount: { r: 4 } }, Database: { acquireCount: { r: 4 } }, Collection: { acquireCount: { r: 4 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 944, timeReadingMicros: 957 } } protocol:op_msg 56ms
2025-05-14T21:18:29.293+0800 I  -        [conn1] Creating profile collection: db.system.profile
2025-05-14T21:18:29.293+0800 I  STORAGE  [conn1] createCollection: db.system.profile with generated UUID: 05224cf7-a845-41b2-b71d-7e3d899c5757 and options: { capped: true, size: 1048576 }

http://www.dtcms.com/a/423251.html

相关文章:

  • Bootloader核心原理与简单实现:从零写一个bootloader
  • MongoDB到关系型数据库:JSON字段如何高效转换?
  • 网站排名优化原理一个公司能备案多个网站吗
  • 苏大团队联合阿丘科技发表异常生成新方法:创新双分支训练法,同步攻克异常图像生成、分割及下游模型性能提升难题。
  • wordpress如何使用百度主动推送seo短视频网页入口引流下载
  • Docker 镜像加速安装MySQL操作步骤
  • 量子计算技术全景:从硬件路线到AI融合
  • 人工智能-机器学习day1
  • 济南网站制作企业建设部标准定额网站
  • 微服务组件-Eureka 技术详解
  • ARM架构下I/O内存映射全面技术分析
  • 大学网站建设管理办法岳阳市网站建设推广
  • Java 操作 XML 及动态生成报告:从解析到实战
  • 网络配置config.xml的android.mk解析
  • 网站导读怎么做wordpress二级目录创建
  • 分布式限流
  • ES-DE 前端模拟器最新版 多模拟器游戏启动器 含游戏ROM整合包 最新版
  • 【Linux网络】TCP协议
  • 分布式排行榜系统设计方案
  • 西双版纳住房和城乡建设局网站上海手机网站建设价格
  • oracle多租户环境CDB与PDB操作
  • 超市营销型网站建设策划书手机网站建站用哪个软件好
  • 使用宏实现高效的分页查询功能
  • 从语言到向量:自然语言处理中的核心转换技术与实践
  • 申请一个网站需要多少钱网站怎么添加统计代码
  • 基于机器学习的异常流量检测系统的设计与实现(原创)
  • 网站建设人员组成做网上商城网站
  • 新天力:食品容器安全与创新的领航者
  • C++_day4
  • 多解法详解与边界处理——力扣7.整数反转