.gz
/.bz2
files is available.
I am working with the complex java application that makes many log files (each separate process of the application makes a separate logfile). Once we wre needed to merge several log files into one file with entires sorted chronologically. There was several questions to be solved:
- Dates are stored in log files in the format
mm/dd/YYYY HH:MM:SS.ccc
(ccc
- milliseconds) totally incompatible for sorting chronologically. It requires to be converted to the more confortable format, for exmpleYYYYmmddHHMMSSccc
; - Multiline entries. Some entries occupy more than one string of the logfile (for example stacktrace usual for java-applications). In this case the sorting breaks the order strings;
- Strings having the same timestamp should keep the order within one input file. That means that the order of two log entries
"B"
and"A"
with one timestamp should be kept during sorting.
Finally I have developed the tool covering all our requests. It requires Bash and Perl and gzip/bzip2 for reading of packed files. Of course all these things are native in Unix world. But they are available for those Windows users who have istalled Cygwin.
Let's consider examples describing the main features of the tool.
Examples
1. Merge all Apache error files
./logmerge --apache-error ./error.log* > all.logMerge all
error.log*
Apache files, including gziped files too, and store to the resulting file. The --apache-error
option considers that each line seems like the example below, makes the marker containing the sortable timestamp 20100423221421
corresponding to the original one:
[Fri Apr 23 22:14:21 2010] <the rest of the entry>
2. Merge all Apache access files
find /export/home/ -name 'access.log' | xargs ./logmerge -f -n --apache-access > all.logFind all last
access.log
Apache files from all home directories within the /export/home
directory, merge them chronologically and store to the resulting file. Additionally each line of the file will begin with a filename and line number within the original file. The utility considers that the Apache's access log files consist of the following logentries and transforms the found timestamp to the sortable form 20080215141549
:
<the begin of the entry> [15/Feb/2008:14:18:49 +0300] <the rest of the entry>
3. Merge multiline entries from several files chronologically
./logmerge -f -n log/*.log | gzip -c > all.gzMerge all files located within the
log/
directory and pass the result to archive. The filename and the line number will be added at the beginning of each line in the resulting file. By default the utility assumes that each log entry begins with a timestamp and can occupy more than a single line (e.g.: Java's stack traces like below):
05/21/2012 21:54:41.070 <the rest of the entry> java.lang.Throwable at boo.hoo.StackTrace.bar(StackTrace.java:223) at boo.hoo.StackTrace.foo(StackTrace.java:218) at boo.hoo.StackTrace.main(StackTrace.java:54) ...
The project is hosted on Google Code and available for download under MIT license.
No comments:
Post a Comment