Before, our kaltura DWH server generated the following error about “/opt/kaltura/dwh/etlsource/execute/etl_hourly.sh”.
INFO 10-07 04:00:14,634 - Create output files - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO 10-07 04:00:14,635 - Mapping input specification - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO 10-07 04:00:14,743 - Enrich cycle_id and file_id - play - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
INFO 10-07 04:00:14,837 - iterate file - Opening file: /opt/kaltura/dwh/cycles/process/4/cak02bs.cc.yamaguchi-u.ac.jp-kaltura_apache_access_ssl.log-20180710-03
INFO 10-07 04:00:14,841 - parse bandwidth lines - Optimization level set to 9.
INFO 10-07 04:00:14,842 - parse playManifest line - Optimization level set to 9.
INFO 10-07 04:00:14,851 - parse playManifest line - Optimization level set to 9.
INFO 10-07 04:00:14,852 - decode http string - Optimization level set to 9.
INFO 10-07 04:00:14,857 - parse bandwidth lines - Optimization level set to 9.
INFO 10-07 04:00:14,865 - decode http string - Optimization level set to 9.
ERROR 10-07 04:00:19,058 - parse bandwidth lines - Unexpected error
ERROR 10-07 04:00:19,059 - parse bandwidth lines - org.pentaho.di.core.exception.KettleValueException:
Javascript error:
Could not apply the given format dd/MMM/yyyy:HH:mm:ss on the string for 09/Jul/2018:09:56:29 : Format.parseObject(String) failed (script#15)“09/Jul/2018:09:56:29” seems to correspond with the dd/MMM/yyyy:HH:mm:ss format.
This error message was created when the “etl_hourly.sh” try to process “cak02bs.cc.yamaguchi-u.ac.jp-kaltura_apache_access_ssl.log-20180710-03”.
From the error message, “09/Jul/2018:09:56:29” seems to correspond to “dd/MMM/yyyy:HH:mm:ss” format.
But, However, the format of log-file is as follows.
10.6.209.196 - - [09/Jul/2018:09:56:29 +0900] “POST /api_v3/index.php/service/baseentry/action/list?kalsig=3ec79baff0c8d93e8c4fdfbfa54c734c HTTP/1.1” 200 2266 0/206705 “https://cak02bs.cc.yamaguchi-u.ac.jp/flash/kmc/v5.43.13/kmc.swf” “Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0” “-” 10.6.209.196 “-” “cak02bs.cc.yamaguchi-u.ac.jp” 11737 420355196 + 1334 “-” “-” “-” “-” “no-store, no-cache, must-revalidate, post-check=0, pre-check=0” 101
That is, the actual logs correspond to “dd/MMM/yyyy:HH:mm:ssZ” format.
Since the pentaho uses the java, I created the following java program.
# more TestClass.java
public class TestClass {
public static void main(String args[]) {
java.time.LocalDateTime ldt = java.time.LocalDateTime.now();
System.out.println(java.util.TimeZone.getDefault().getID());
java.util.TimeZone desiredTimeZone = java.util.TimeZone.getTimeZone(java.util.TimeZone.getDefault().getID());
java.text.SimpleDateFormat dateFormat = new java.text.SimpleDateFormat("[dd/MMM/yyyy:HH:mm:ssZ]", java.util.Locale.ENGLISH);
java.time.ZonedDateTime eventTime = ldt.atZone(java.time.ZoneId.of(java.util.TimeZone.getDefault().getID()));
System.out.println(eventTime);
System.out.println(eventTime.format(java.time.format.DateTimeFormatter.ofPattern("[dd/MMM/yyyy:HH:mm:ssZ]")));
eventTime = ldt.atZone(java.time.ZoneId.of("America/New_York"));
System.out.println(eventTime);
System.out.println(eventTime.format(java.time.format.DateTimeFormatter.ofPattern("[dd/MMM/yyyy:HH:mm:ssZ]")));
}
}
The execution result of this program is as follows.
# java TestClass
Asia/Tokyo
2018-10-29T22:47:31.423+09:00[Asia/Tokyo]
29/10/2018:22:47:31+0900
2018-10-29T22:47:31.423-04:00[America/New_York]
29/10/2018:22:47:31-0400
Based on the above results, I have created previously reported patches.
By applying these patches, the errors have been solved.
Now, our DWH server processes log-files normally, and play/view count is updated normally.
However, our patches have a problem.
If the DWH server works under the environment which logs correspond to “dd/MMM/yyyy:HH:mm:ss” format, our patches cause the same errors.
Therefore, it is not appropriate to apply our patches to source code of the Kaltura CE.
Rather, we should consider a mechanism that correctly detects the time format adopted by the server.
Or, we should consider a method that unifies the time format in log-files.
I hope that my report will be helpful for something.
Regards