We tried to unpack the dipostitions today and are currently working on the (1). (2) is already done (the second defintition in there is a spare one)
current def of data openness
A quality which inheres in data, by virtue of (1) the absence of restrictions - monetary, technical, or otherwise - upon their use, (2) the data being hosted in a repository which allows, in whole or in part, the free access to them, (3) the data containing all information required for license compliance, and (4) the data being serialised in such a way as to readily allow identification of, access to, and modification of their parts.
individual dispositions:
(1) A disposition which inheres in a datum and is realised when that datum can be used or reused without encountering or violating any monetary, technical, or other restrictions, except those deemed acceptable by prevailing open data policies.
Note: One can unknowingly violate existing restrictions.
Note: This could be under reusability or somehow else related
I am not understanding what dispositions we are talking about.
What is e.g. the disposition for the absence of restrictions - monetary, technical, or otherwise - upon their use ? e.g. technical or monetary are adjectives, not dispositions.
on:
or other restrictions, except those deemed acceptable by prevailing open data policies.
if you want to use this disposition to later construct "openness" i can see some circularity coming up here... ;-)
from BFO: a quality is a specifically dependent continuant that, in contrast to roles and dispositions, does not require any further process in order to be realized.
Is that not the case for "Open" data? I.e. is there another process required to make it open? The only part in the original definition that is a bit in the grey for me is part 4. All the other part are clear yes or no cases (i.e. in contrast to the F-A-I-R dispositions).
Hm,.. I can see your point. My thinking was that it is definateley possible to say "this data is open data" (without it being accessed/used else), but it is not definitively possible to say "this data is FAIR data" (without another agent actually re-using it proving its interoperability). (maybe i mean also "should" instead of "is")
The (1) absence of restrictions as well as (3) license information i would say could be defined as qualities. I see the point that the (2) hosting in an accessible repository and (4) readily accessible/modifiable parts are more close to what we did with the FAIR dispositions.
So - we could turn the parts (2) and (4) down as they are more part of how to make open data accessible / available, and emphasize the points (1) and (3) which are more about the absence of restriction and the data being "legally open". That would mean we have a kind of "legal openness" which would probably be more of a quality, and an "general openness" which would be more of an disposition.
These are also the only parts of the definitions where i do see a benefit of "unpacking". Is there any other component of the original definition that needs to exist as an independent disposition/quality?
Are you answering to this #65 (comment 3877038) ? Because we don't mention any 1), 2), 3), or 4) in there.
What we did today, was dealing with the first TODO at the end of this #65 (comment 3832772)
We found nothing on openness which would inhere in a datum itself. Rather, every aspect that contributes to some datum being open, seems to inhere in external factors. Therefore we continue to think that the bfo:role might be a good fit for "openness".
With the numbers i am referring to the parts in the original definition.
1 and 3 inherit more in the datum itself (i.e. absence of restrictions can be seen as a property - like e.g. readness), 2 & 4, i agree, are more based in external factors & context.
I don't see why we need to unpack the original definition, with the exception to separate legal and access related parts of the defintion. Generally i agree to turn "openness" into a disposition rather than defining it as a quality (i.e. in line with the defs of the FAIR dispositions)
I guess that unpacking the single factors will become relevant soon. For example, usually Helmholtz (meta)data is subject to a policy and the legal aspect is worth to be defined separately to be used in processes but also to define a data policy. I feel that the original definition can be improved by looking at the aspects of openess in more detail.
Further, not only data may be open but also a file format, product, tool, or knowledge. As a result we would need a general defintion of "open".
We have two levels here: 1) the super-class of data openness and 2) the super-classes of the individual factors that play into data openness. For both of these levels we are currently discussing whether the super-classes are qualities, dispositions, or roles.
For me, the easier question is that of disposition vs. role. From their definitions and a dive into the bfo-discuss google group (e.g. here and here) my conclusion is that a role works as follows: Some external circumstances cause an entity to have a role which can then (doesn't have to) be realized one or more times in processes. The difference to a disposition would be that there the circumstances are internally grounded.
Knowing that, the question then is for every individual "data openness factor" whether it is internally or externally grounded with regards to the datum. Simple example: Imagine a datum which encodes the signifier of a temperature measurement. What is actually inherent in that datum which might influence whether it is open or not? I would argue that, at most, the structure/format of the datum is inherent. But even for this, one could argue that the openness inheres in the public (knowledge of the structure/format) instead. Aside from this, I see no other internally grounded factor here. Of course, you may also argue that a usage license or a price tag could be part of that temperature measurement datum, but I would rather put those into the related metadata.
Now, the more difficult question for me is the distinction between quality and role. The elucidation of quality is quite short: "A quality is a SDC that, in contrast to roles and dispositions, does not require any further process in order to be realized." As discussed before, a datum can have an openness role without that role being realized all the time. What seemed contradictory to me here at first, were the absolute and universal aspects of data openness. A datum is either completely open or not open at all. There are no "half-open" states. So isn't open data then also always open?
Well, in the end, my conclusion is that this is covered by the fact that an open datum always has that open role. It just doesn't realize it all the time. Which can definitely make sense, when you think about where the openness of a datum is relevant in your work: It's when you want to get the data and when you want to publish results based on that data that it becomes relevant. When e.g. analyzing it, the datum's openness doesn't matter to you.
I have not addressed every point discussed in this issue but I hope I could answer to a few of the raised concerns. Please let me know if you think that I did not properly consider your arguments.
To re-visit this dicussion. The argument for having openness as a disposition was that the openness does not inhere in the data/datum itself, but in the way how it is stored. This was mostly constructed in comparison to our definitions of the FAIR dispositions.
I think it is not so relevant to look at if that quality/disposition is inherent in a datum or not to determine if it is a disposition or a quality. For example a person that has a disposition to develop colon cancer has this in the genetic code (i.e. it is not dependent on external agents/factors). The same is true for a person with the quality of blonde hair. The difference between disposition is if something is inherently present (hence the construct "a quality which inheres in some..") , or if something has to be realized ("a disposition which resides in a xx and is realized through...".). Thus the question of data openness being a quality and disposition is ill framed so far as it is irrelevant whether openness is inherent in the piece of datum or not. In fact for the purpose of openness i would argue that the data/datum as well as the way how it is stored/deposited is part of the "data (thing)" that is looked at - i.e. one instance of a file can be open (on a www repository) while another instance (on my HDD) might not be.
The main difference between disposition and quality is the "readiness". I.e. for the FAIR principles many of the related things are "soft" definitions which depend on a usecase that we only know AFTER we create the metadata. As such we can create dispositions for interoperabiltiy but not a quality of interoperability.
The parts of the openness definitions are however more directly/apperent/measureable things.
the absence of restrictions - monetary, technical, or otherwise - upon their use (I dont need someone to use the piece of data to waive all restrictions on it)
the data being hosted in a repository which allows, in whole or in part, the free access to them (here we have some overlap with 'accessibility' which is one of the more quality like FAIR principles. However if framed like here it is clear that "free access" is directly measureable while A1.1 and A2 are dependent of future "ifs"
the data containing all information required for license compliance (clearly measureble)
the data being serialised in such a way as to readily allow identification of, access to, and modification of their parts. (this is the softest part of the definition and the only part where i would say i see why this could be a disposition rather than a qualtiy).
In order to discriminate the difference in "readiness" level (i.e. FAIRness is in fact a "higher standard" that openness) i would leave openness as a quality with the current definition.
I further are questioning how the "FAIR" quality in HDO came to life - however i can somewhat agree to that, as this is dependent on the dispositions and the rdfs:commment summaries the "dilemma" nicely.
We started into this issue again. It tooks us a while to get back into it.
@v.hofmann before we discuss this again, lets try to notify all who will participate before the discussion, so that everyone can read up on the discussion at least a little bit before hand.
One argument that we arrived at as a response to Volker's last argument a few months ago:
In fact for the purpose of openness i would argue that the data/datum as well as the way how it is stored/deposited is part of the "data (thing)" that is looked at - i.e. one instance of a file can be open (on a www repository) while another instance (on my HDD) might not be.
The way how data is stored can probably not be part of the data because data is a generically dependent continuant. And GDCs don't really care what they depend on (i.e. in the case of data which storage they depend on) as long as there is just any other continuant that they can depend on.