WazFormat is a format identification software that works with streams. It allows users to tell which type is the content of the stream without consuming it.

It integrates a native format identification engine with Droid (a format identification library sponsored by the UK National Archives ) and in future will integrate mime-utils and apache tika.


You can use this library for :

  • Identifies more than 60 file formats.
  • Can do nested detection: it can detect what is inside a bzip2 stream or a PKCS#7 document.
  • Result of identification is an Enum. Most of identification libraries return a string that must be further parsed by the calling software.
  • If enabled can unwrap some of the detected formats: can "extract" a document from a PKCS#7 envelope or automatically "gunzip" a stream.

Due to a wrong choice of the inner detection library (Droid) WazFormat has some serious performance issue. Though it is currently used in production in many projects, users should try to limit the number of formats detected. Have a look at the examples section of the documentation on how to do this.

What's next?

Have a 5 minutes tour of the features:

  • Learn how to add the library to your project.
  • Post your doubts on the users forum . We will be glad to help you, and to answer your questions! The forum is moderated, and it has very few messages per year. (Please don't contact project administrators on private email, your doubts might be useful to other users.)
  • Tell us your ideas. We're ready to implement them!
  • Show us your appreciation! Like us on Google+, write a positive review on SourceForge , give us feedbacks.