Skip to content

file format

Problem statement

Arising in another discussion with Oonagh: definition and gloss of file format

  • current definition: A format which defines how a file is to be structured to conform to an abstract [TODO: file/data] type.

We think that this is too loose and adds unnecessary complexity by refering to an abstract concept.

I would propose something like this:

  • definition: A format which defines the representation of data in a sequence to be stored in a digital system.
  • comment: A file format is the product of a serialization process in which data is represented as a sequence of bits. Examples for file formats are tiff, HDF5, and text file. To be distinguished from data formats such as xml or csv which are stored in text file format.
  • gloss: A file format defines the sequence of bits which are stored in a digital system.

What do you think?


Proposed changes/additions to ontology

  • remove class file type, add as broad synonym for
  • remove class data format, add as exact synonym for data structure specification
  • implement changes in file according to:
label: file
definition: Structured data which is serialised in a way to conform with a file format. 
subclass Of: structured data 
  • implement changes in file format according to:
rdfs-label: file format
broad synonym: file type 
definition: A specification which defines how data is serialization as a defined unit for handling (within a file system).
comment: 
 This definition focuses on the information centric view of the generically dependent continuant, and pertains to digital files
 in the file system on a computer. Examples are ascii text file, tiff, or jpeg.
 Metadata associated with a file is stored in a file system, and may include the file name, the length of the content of a file, and the 
 location of the file in the folder hierarchy — separate from the contents of the file. A file system differs from other forms of data 
 management systems in that file systems supports and/or requires, e.g., filenames, filename extensions, magic numbers, and other 
 unique components.
 On a hardware-level parts of a file might be stored in different locations in a discontinuous fashion. Typically files follow a defined 
 file format to be usable by software and may include a file signature that minimally identifies the file format. Files are typically 
 handled with some file system, for digital files that would typically be the file system of the operating system.
subclass Of: specification
  • implement changes in data structure specification according to:
rdfs-label: data structure specification 
related/exac synonym: data format
dbxref: add from `format` & `data format`
  • implement changes in file name extension according to:
rdfs-label: file name extension
exact synonym: file extension
definition: A string which is (1) part of a file name and (2) indicates the file format for being processed by a machine.
  • depricate format
  • implement changes in file system according to:
label: file system
definition: >
  A system which comprises data structures, (meta)data, and software which allows a computer to store, organise, and access structured data
comment: >
 Metadata stored in a file system is associated with files and may include the file name, the length of the contents of a file, and the location of 
 the file in the folder hierarchy—separate from the contents of the file. A file system differs from other forms of data management systems (e.g a database 
 management system) in that file systems supports and/or requires, e.g., filenames, filename extensions, magic numbers, and other unique components. These 
 components result in markedly different user experiences when accessing structured data through such systems.
sources: 
 - https://en.wikipedia.org/wiki/File_system
Edited by Volker Hofmann