File format design

Steps for the design of a new file format

  1. Is it necessary?

    It is important to note that there are already tens of thousands of file formats that have already been created. Try to see if you cannot use a file format that has already been created; the best choice is to use one of the standard file formats, for more information on standard file formats, check this page, as well as the registered MIME types.

    If those formats do not fit, try using one of the defunct (now longer maintained) file formats by reviving it. If you still wish to create a new file format, see below for further tips.

  2. Choose a file extension

    Choose a unique file extension for that specific file format. Try avoiding confusing file extensions such as .IMG

  3. Determine of the encoding of the file format

    Determine the type of encoding the file will have: ASCII (ISO 646 IRV), text or binary. ASCII encoding is portable across systems that have different endian architectures. On the other hand, parsing a text or ASCII encoded file is usually slower than reading directly from a binary file. If the encoding will be binary, determine the endian of the encoded data. Most personal computers use little-endian encoding, most servers as well as network byte order is in big-endian encoding. It is also possible to design a file format that can encode both types of data.

  4. Chunk based or direct stream format

    Determine if the file needs to be extensible, in that case consider creating a chunk based file format (such as XML, PNG, JPEG, IFF and RIFF file formats). Chunk based file formats contain data chunks of data preceded by a header identifying the following data and an optional footer. This permits older software applications to skip over data that it does not recognize.

  5. Create magic value to identify the file

    Signatures in files are used to easily identify file formats by automatic identifier tools, as well as by general software. In the 1980's file signatures would consists of only 2 bytes (or 2 characters), but with the amount of file formats in existence today, this creates several duplicate identification. It is strongly suggested that your signature consist of at least 8 bytes or 8 characters.

  6. Decide if metadata is required in the file format

  7. Document the format

    If the format is to be widely used, it is very important to have a clear and concise document describing the entire file format (document it in a plain ASCII file, or a PDF or HTML document).

  8. Create samples of the format

    For programmers to really understand the internals of the file format, it is important that complex sample files of the file format be available.

  9. Create a simple API to access the files

    For uniformity, it is also possible, to get wider acceptance, to create a simple and freeware file access library, with optional source code. The access library, for greater portability across different languages should be coded either in C, Java or .NET.

  10. Register the file format

    Get a MIME type for your file format. RFC 2048 describes how to register a new MIME type.

    If you decide to create a new IFF or RIFF type file, do not forget to register this new format to the registrar maintainers (the maintainers are not the original creators of the format!) .

  11. Submit the file format

    Submit the file format to magicdb.org, as well as wotsit.org and filext.com it will be registered, and added to the file formats database, and the tools will also be able to identify your file format.

Technical information



Possible encoding types for files are described below:

Technical suggestions



    These are suggestion for different encodings of the data:

Last modification $Date: 2004/09/06 23:17:33 $