Kettle时间戳转换为日期格式处理方式
概述
在Kettle(Pentaho Data Integration)中,时间戳转换为日期格式是常见的数据处理需求。本文档介绍使用Java和JavaScript两种脚本方式来处理时间戳转换。
1. Java脚本处理方式
1.1 使用Java脚本步骤
在Kettle中使用"Java代码"步骤来处理时间戳转换:
// 获取输入字段值
String timestampStr = get(Fields.In, "timestamp_field").getString(r);// 方法1:处理Unix时间戳(秒)
if (timestampStr != null && !timestampStr.isEmpty()) {try {// 判断是秒级还是毫秒级时间戳long timestamp = Long.parseLong(timestampStr);if (timestampStr.length() <= 10) {// 秒级时间戳,转换为毫秒timestamp = timestamp * 1000;}// 毫秒级时间戳直接使用// 创建Date对象Date date = new Date(timestamp);// 格式化日期SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");String formattedDate = sdf.format(date);// 输出到目标字段get(Fields.Out, "formatted_date").setValue(r, formattedDate);} catch (NumberFormatException e) {// 处理异常get(Fields.Out, "formatted_date").setValue(r, null);get(Fields.Out, "error_message").setValue(r, "Invalid timestamp format");}
}// 方法2:处理字符串格式的时间戳
String dateStr = get(Fields.In, "date_string").getString(r);
if (dateStr != null && !dateStr.isEmpty()) {try {// 解析各种日期格式SimpleDateFormat inputFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");Date date = inputFormat.parse(dateStr);// 输出格式SimpleDateFormat outputFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");String result = outputFormat.format(date);get(Fields.Out, "converted_date").setValue(r, result);} catch (ParseException e) {get(Fields.Out, "converted_date").setValue(r, null);get(Fields.Out, "error_message").setValue(r, "Date parsing error: " + e.getMessage());}
}
1.2 完整的Java脚本示例
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;// 主处理逻辑
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {Object[] r = getRow();if (r == null) {setOutputDone();return false;}// 创建输出行Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());// 获取时间戳字段String timestampField = get(Fields.In, "timestamp").getString(r);if (timestampField != null && !timestampField.trim().isEmpty()) {try {// 处理数字时间戳if (timestampField.matches("\\d+")) {long timestamp = Long.parseLong(timestampField);// 判断是秒级还是毫秒级时间戳if (timestampField.length() <= 10) {// 秒级时间戳,转换为毫秒timestamp = timestamp * 1000;}// 毫秒级时间戳直接使用Date date = new Date(timestamp);// 多种输出格式SimpleDateFormat sdf1 = new SimpleDateFormat("yyyy-MM-dd");SimpleDateFormat sdf2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");SimpleDateFormat sdf3 = new SimpleDateFormat("yyyy年MM月dd日 HH:mm:ss");get(Fields.Out, "date_only").setValue(outputRow, sdf1.format(date));get(Fields.Out, "datetime").setValue(outputRow, sdf2.format(date));get(Fields.Out, "chinese_format").setValue(outputRow, sdf3.format(date));} else {// 处理字符串格式的时间戳SimpleDateFormat inputFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");Date date = inputFormat.parse(timestampField);SimpleDateFormat outputFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");get(Fields.Out, "converted_date").setValue(outputRow, outputFormat.format(date));}} catch (Exception e) {// 错误处理get(Fields.Out, "error_flag").setValue(outputRow, "Y");get(Fields.Out, "error_message").setValue(outputRow, e.getMessage());}}// 输出行putRow(data.outputRowMeta, outputRow);return true;
}
2. JavaScript脚本处理方式
2.1 使用JavaScript步骤
在Kettle中使用"JavaScript"步骤来处理时间戳转换:
// 获取输入字段
var timestampValue = timestamp_field;// 方法1:处理Unix时间戳
if (timestampValue != null && timestampValue != "") {try {// 转换为数字var timestamp = parseInt(timestampValue);// 判断时间戳长度,确定是秒级还是毫秒级if (timestampValue.toString().length <= 10) {// 秒级时间戳,转换为毫秒timestamp = timestamp * 1000;}// 毫秒级时间戳直接使用// 创建Date对象var date = new Date(timestamp);// 格式化日期var year = date.getFullYear();var month = String(date.getMonth() + 1).padStart(2, '0');var day = String(date.getDate()).padStart(2, '0');var hours = String(date.getHours()).padStart(2, '0');var minutes = String(date.getMinutes()).padStart(2, '0');var seconds = String(date.getSeconds()).padStart(2, '0');// 输出不同格式formatted_date = year + "-" + month + "-" + day + " " + hours + ":" + minutes + ":" + seconds;date_only = year + "-" + month + "-" + day;chinese_format = year + "年" + month + "月" + day + "日 " + hours + ":" + minutes + ":" + seconds;} catch (e) {formatted_date = null;error_message = "Invalid timestamp: " + e.message;}
}// 方法2:处理ISO格式时间字符串
var isoString = iso_timestamp_field;
if (isoString != null && isoString != "") {try {var date = new Date(isoString);// 检查日期是否有效if (isNaN(date.getTime())) {throw new Error("Invalid date format");}// 格式化输出var year = date.getFullYear();var month = String(date.getMonth() + 1).padStart(2, '0');var day = String(date.getDate()).padStart(2, '0');var hours = String(date.getHours()).padStart(2, '0');var minutes = String(date.getMinutes()).padStart(2, '0');var seconds = String(date.getSeconds()).padStart(2, '0');converted_date = year + "-" + month + "-" + day + " " + hours + ":" + minutes + ":" + seconds;} catch (e) {converted_date = null;error_message = "Date conversion error: " + e.message;}
}
2.2 完整的JavaScript脚本示例
// 时间戳转换函数
function convertTimestamp(timestampValue, format) {try {if (timestampValue == null || timestampValue == "") {return null;}var timestamp = parseInt(timestampValue);// 判断时间戳类型if (timestampValue.toString().length <= 10) {// 秒级时间戳,转换为毫秒timestamp = timestamp * 1000;}// 毫秒级时间戳直接使用var date = new Date(timestamp);// 验证日期有效性if (isNaN(date.getTime())) {throw new Error("Invalid timestamp");}var year = date.getFullYear();var month = String(date.getMonth() + 1).padStart(2, '0');var day = String(date.getDate()).padStart(2, '0');var hours = String(date.getHours()).padStart(2, '0');var minutes = String(date.getMinutes()).padStart(2, '0');var seconds = String(date.getSeconds()).padStart(2, '0');switch(format) {case 'date':return year + "-" + month + "-" + day;case 'datetime':return year + "-" + month + "-" + day + " " + hours + ":" + minutes + ":" + seconds;case 'chinese':return year + "年" + month + "月" + day + "日 " + hours + ":" + minutes + ":" + seconds;case 'time':return hours + ":" + minutes + ":" + seconds;default:return year + "-" + month + "-" + day + " " + hours + ":" + minutes + ":" + seconds;}} catch (e) {return null;}
}// 处理多个时间戳字段
var timestamp1 = timestamp_field_1;
var timestamp2 = timestamp_field_2;
var timestamp3 = timestamp_field_3;// 转换第一个时间戳
if (timestamp1 != null) {formatted_date_1 = convertTimestamp(timestamp1, 'datetime');date_only_1 = convertTimestamp(timestamp1, 'date');chinese_format_1 = convertTimestamp(timestamp1, 'chinese');
}// 转换第二个时间戳
if (timestamp2 != null) {formatted_date_2 = convertTimestamp(timestamp2, 'datetime');date_only_2 = convertTimestamp(timestamp2, 'date');
}// 转换第三个时间戳
if (timestamp3 != null) {formatted_date_3 = convertTimestamp(timestamp3, 'datetime');time_only_3 = convertTimestamp(timestamp3, 'time');
}// 错误处理
if (formatted_date_1 == null && timestamp1 != null) {error_flag = "Y";error_message = "Failed to convert timestamp1: " + timestamp1;
}
3. 常用时间格式
3.1 输入格式
- Unix时间戳(秒):
1640995200
- Unix时间戳(毫秒):
1640995200000
- ISO 8601格式:
2022-01-01T00:00:00.000Z
- 标准格式:
2022-01-01 00:00:00
3.2 输出格式
- 日期:
2022-01-01
- 日期时间:
2022-01-01 00:00:00
- 中文格式:
2022年01月01日 00:00:00
- 时间:
00:00:00
4. 性能优化建议
4.1 Java脚本优化
// 预创建SimpleDateFormat对象,避免重复创建
private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
private static final SimpleDateFormat DATE_ONLY_FORMAT = new SimpleDateFormat("yyyy-MM-dd");// 在processRow方法中使用
String formattedDate = DATE_FORMAT.format(date);
4.2 JavaScript脚本优化
// 预定义格式化函数
var formatDate = function(date) {var year = date.getFullYear();var month = String(date.getMonth() + 1).padStart(2, '0');var day = String(date.getDate()).padStart(2, '0');var hours = String(date.getHours()).padStart(2, '0');var minutes = String(date.getMinutes()).padStart(2, '0');var seconds = String(date.getSeconds()).padStart(2, '0');return year + "-" + month + "-" + day + " " + hours + ":" + minutes + ":" + seconds;
};
5. 错误处理
5.1 常见错误类型
- 无效的时间戳格式
- 时间戳超出有效范围
- 空值或null值
- 数据类型不匹配
5.2 错误处理策略
// JavaScript错误处理示例
try {var result = convertTimestamp(timestampValue);if (result == null) {error_flag = "Y";error_message = "Conversion failed";}
} catch (e) {error_flag = "Y";error_message = "Error: " + e.message;formatted_date = null;
}
6. 测试用例
6.1 测试数据
输入时间戳 | 输入类型 | 期望输出 |
---|---|---|
1640995200 | Unix秒级 | 2022-01-01 00:00:00 |
1640995200000 | Unix毫秒级 | 2022-01-01 00:00:00 |
2022-01-01T00:00:00.000Z | ISO格式 | 2022-01-01 00:00:00 |
null | 空值 | null |
abc | 无效格式 | null |
6.2 验证脚本
// 验证转换结果
function validateConversion(input, output, expected) {if (output == expected) {validation_result = "PASS";} else {validation_result = "FAIL";validation_message = "Expected: " + expected + ", Got: " + output;}
}
7. 总结
本文档提供了Kettle中时间戳转换为日期格式的完整解决方案,包括:
- Java脚本方式:适合复杂的数据处理逻辑,性能较好
- JavaScript脚本方式:语法简单,易于理解和维护
- 错误处理:完善的异常处理机制
- 性能优化:提供优化建议和最佳实践
- 测试验证:包含测试用例和验证方法
选择哪种方式取决于具体的业务需求和性能要求。对于简单的转换,推荐使用JavaScript;对于复杂的业务逻辑,推荐使用Java脚本。