Using Avro to serialize logs in log4j


I have written about serialization mechanism of Protocol Buffers previously. Similarly, Apache Avro provides a better serialization framework. 

It provide features like:

 - Independent Schema -  use different schemas for serialization and de-serialization
 - Binary serialization - compact data encoding, and faster data processing
 - Dynamic typing - serialization and deserialization without code generation

 We can encode data when serializing with Avro: binary or JSON. In the binary file schema is  included at the beginning of file. In JSON, the type is defined along with the data. Switching JSON protocol to a binary format in order to achieve better performance is pretty straightforward with Avro. This means less type information needs to be sent with the data and it stores data with its schema means any program can de-serialize the encoded data, which makes a good candidate for RPC.

 In Avro 1.5 we have to use (this is different from previous versions which had no factory for encoders)
 - org.apache.avro.io.EncoderFactory.binaryEncoder(OutputStream out, BinaryEncoder reuse) for binary
 - org.apache.avro.io.EncoderFactory.jsonEncoder(Schema schema, OutputStream out) for JSON

 The values (Avro supported value types) are put for the schema field name as the key
 in a set of name-value pairs called  GenericData.Record

 Avro supported value types are
  Primitive Types - null, boolean, int, long, float, double, bytes, string
  Complex Types - Records, Enums, Arrays, Maps, Unions, Fixed
 
  you can read more about them  here

  An encoded schema definition to be provided for the record instance. To read/write data, just use put/get methods
 
   I have used this serialization mechanism to provide a layout for log4j. The logs will be serialized to avro mechanism.

github project is here - https://github.com/harisgx/avro-log4j
 
   Add the libraries to your project and add new properties to log4j.properties

   log4j.appender.logger_name.layout=com.avrolog.log4j.layout.AvroLogLayout
   log4j.appender.logger_name.layout.Type=json
   log4j.appender.logger_name.layout.MDCKeys=mdcKey
 
 Provide the MDC keys as comma seperated values
 
 
   This is the schema


 
 

No comments:

Post a Comment