Skip to content
Snippets Groups Projects
Commit ae02d33d authored by Jens Bröder's avatar Jens Bröder
Browse files

Add first version of jupyter-book for documenation and first CI

parent 37818195
No related branches found
No related tags found
No related merge requests found
Pipeline #193200 failed
stages:
- build
- deploy
pages:
stage: deploy
image: busybox:latest
script:
- mv _build/html public
artifacts:
paths:
- public
rules:
- if: $CI_COMMIT_BRANCH == "main"
environment: production
build:
stage: build
image: python:latest
script:
- pip install -U jupyter-book
- jupyter-book clean docs
- jupyter-book build docs
artifacts:
paths:
- _build/
rules:
- if: $CI_COMMIT_BRANCH != "main"
This folder contains the all files of the documentation pages.
They are created using jupyter-book and hasted with gitlab pages (example see https://gitlab.com/pages/jupyterbook).
Therefore, you can build the docs with:
```
jupyter-book build docs
```
Also ideally the terms should go somewhere else and be automatically included in this.
The same goes for code documenation of the pipelines and their usage.
#######################################################################################
# A default configuration that will be loaded for all jupyter books
# Users are expected to override these values in their own `_config.yml` file.
# This is also the "master list" of all allowed keys and values.
#######################################################################################
# Book settings
title : The unified Helmholtz Information and Data Exchange (UnHIDE) Project # The title of the book. Will be placed in the left navbar.
author : The Helmholtz Metadata Collaboration (HMC) # The author of the book
copyright : "2022" # Copyright year to be placed in the footer
logo : images/unhide_logo.png # A path to the book logo
# Patterns to skip when building the book. Can be glob-style (e.g. "*skip.ipynb")
exclude_patterns : [_build, Thumbs.db, .DS_Store, "**.ipynb_checkpoints"]
# Auto-exclude files not in the toc
only_build_toc_files : false
#######################################################################################
# Execution settings
execute:
execute_notebooks : auto # Whether to execute notebooks at build time. Must be one of ("auto", "force", "cache", "off")
cache : "" # A path to the jupyter cache that will be used to store execution artifacts. Defaults to `_build/.jupyter_cache/`
exclude_patterns : [] # A list of patterns to *skip* in execution (e.g. a notebook that takes a really long time)
timeout : 30 # The maximum time (in seconds) each notebook cell is allowed to run.
run_in_temp : false # If `True`, then a temporary directory will be created and used as the command working directory (cwd),
# otherwise the notebook's parent directory will be the cwd.
allow_errors : false # If `False`, when a code cell raises an error the execution is stopped, otherwise all cells are always run.
stderr_output : show # One of 'show', 'remove', 'remove-warn', 'warn', 'error', 'severe'
#######################################################################################
# Parse and render settings
parse:
myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html
# - amsmath
- colon_fence
# - deflist
- dollarmath
# - html_admonition
# - html_image
- linkify
# - replacements
# - smartquotes
- substitution
- tasklist
myst_url_schemes: [mailto, http, https] # URI schemes that will be recognised as external URLs in Markdown links
myst_dmath_double_inline: true # Allow display math ($$) within an inline context
#######################################################################################
# HTML-specific settings
html:
favicon : images/favicon.png # A path to a favicon image
use_edit_page_button : false # Whether to add an "edit this page" button to pages. If `true`, repository information in repository: must be filled in
use_repository_button : true # Whether to add a link to your repository button
use_issues_button : true # Whether to add an "open an issue" button
use_multitoc_numbering : true # Continuous numbering across parts/chapters
extra_navbar : Powered by <a href="https://jupyterbook.org">Jupyter Book</a> # Will be displayed underneath the left navbar.
extra_footer : "" # Will be displayed underneath the footer.
google_analytics_id : "" # A GA id that can be used to track book views.
home_page_in_navbar : true # Whether to include your home page in the left Navigation Bar
baseurl : "" # The base URL where your book will be hosted. Used for creating image previews and social links. e.g.: https://mypage.com/mybook/
comments:
hypothesis : false
utterances : false
announcement : "" # A banner announcement at the top of the site.
#######################################################################################
# LaTeX-specific settings
latex:
latex_engine : pdflatex # one of 'pdflatex', 'xelatex' (recommended for unicode), 'luatex', 'platex', 'uplatex'
use_jupyterbook_latex : true # use sphinx-jupyterbook-latex for pdf builds as default
#######################################################################################
# Launch button settings
launch_buttons:
notebook_interface : classic # The interface interactive links will activate ["classic", "jupyterlab"]
binderhub_url : https://mybinder.org # The URL of the BinderHub (e.g., https://mybinder.org)
jupyterhub_url : "" # The URL of the JupyterHub (e.g., https://datahub.berkeley.edu)
thebe : false # Add a thebe button to pages (requires the repository to run on Binder)
colab_url : "" # The URL of Google Colab (https://colab.research.google.com)
repository:
url : https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/documentation # The URL to your book's repository
path_to_book : docs # A path to your book's folder, relative to the repository root.
branch : main # Which branch of the repository should be used when creating links
#######################################################################################
# Advanced and power-user settings
sphinx:
extra_extensions : # A list of extra extensions to load by Sphinx (added to those already used by JB).
local_extensions : # A list of local extensions to load by sphinx specified by "name: path" items
recursive_update : false # A boolean indicating whether to overwrite the Sphinx config (true) or recursively update (false)
config : # key-value pairs to directly over-ride the Sphinx configuration
# Table of contents
# Learn more at https://jupyterbook.org/customize/toc.html
format: jb-book
root: intro
chapters:
- file: implementation
- file: data_sources
## Data Sources
#### Initial Scope
Initial efforts of the Helmholtz-KG implementation will focus on the representation of (meta)data describing the following digital core assets:
- Documents / Publications
- published Datasets
- Software
- Institutions
- Infrastructure & Ressources
- Researchers & Experts
- Projects
The representation of these instances will semantically alligned with the [schema.org](https://schema.org/docs/full.html) vocabulary, a globally adopted standard offering a relaxed frame for the representation of heterogeneous data. Following the initial implementation the semantic expresivness of the graph can be increased by integrating domain ontologies such as the HMC developed [Helmholtz Digitization Ontology](https://codebase.helmholtz.cloud/hmc/hmc-public/hob/hdo) (HDO), which provides precise and comprehensive semantics of the concepts and practices used to manage digital assets.
#### Data Ingestion Process
The Helmholtz-KG will offer multiple options for existing and emerging HGF infrastructures, data providers, and communities to declare their resources and digital assets in the graph for discoverability. We will prioritise the recommended publishing process for structured data on the web (as used by ODIS/OIH and many others): data providers would either 1) provide a sitemap or robots.txt file which will direct harvesting software to a collection of JSON-LD/schema.org documents or 2) expose JSON-LD snippets in the document head element of a web resource (i.e. HTML document). Both approaches are described in the publisher documentation of the Ocean InfoHub Project [3].
Alternative publication patterns may include -- HTTP-accessible [RO-Crate](https://www.researchobject.org/ro-crate/) [4] metadata in `ro-crate-metadata.json`, the exposure of structured metadata records via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [5] or properly documented RESTful APIs in general [6]. We will explore the need and feasibility of alternate publishing and harvesting modes during the course of unHIDE.
HMC personnel will support the onboarding of data providers as well as the implementation of custom (meta)data pipelines / connectors and mapping to RDF / JSON-LD, if necessary and where appropriate.
#### Potential Data Providers
Within the HGF a number of relevant web-based data architectures exist. These will be targeted by unHIDE to collaborate on building interfaces to the Helmholtz-KG.
The initial implementation fill focus on:
- HGF institutional (data) repositories
- Central Libraries of Helmholtz Research Centers
- Domain-specific (data) repositories relevant to HGF
- Helmholtz GitLab Instances
Subsequent efforts will include further ressources such as:
- Helmholtz FAIR digital objects (via HMC)
- Helmholtz Ontology Base (HOB) (via HMC)
- The Helmholtz [software directory](https://helmholtz.software/) (centrally maintained by HIFIS)
- [Helmholtz Data Challenges Platform](https://helmholtz-data-challenges.de/)
- other ressources of the Helmholtz Metadata Collaboration (HMC)
- Content management systems (CMS)
- Helmholtz Computing centers (e.g. JSC)
- Helmholtz Federated IT Services (HIFIS)
- Helmholtz instruments and sensor databases (e.g. @GFZ, DEPAS @AWI, RDMInfoPool, etc.)
- Helmholtz Scientific Project Workflow Platform (HELIPORT)
- [Helmholtz Imaging Modalities database](https://modalities.helmholtz-imaging.de/)
- Laboratory information management systems (LIMS)
- Helmholtz Open Science Office
docs/images/favicon.png

7.7 KiB

docs/images/unhide_logo.png

2.38 KiB

## Architecture & Implementation
### Foundational architecture
The Helmholtz Knowledge Graph (Helmholtz-KG) aims to enhance the HGF's digital capacities, transparency, and productivity through dissemination and implementation of Linked Data principles (Box 2). Thus, unHIDE will build the Helmholtz-KG on mature web architecture and state-of-the-art semantic web technologies. This will ensure reliability and compatibility with global systems, while also exploring innovative approaches to maximise the Helmholtz-KG's ability to accelerate research and operations.
> Box 2
>
> Graph data is:
> - Open-world, allowing resilient operations with novel or unexpected data flows
> - Faster than using SQL and associated JOIN operations
> - Better suited to integrating data from heterogeneous sources
> - Better suited to situations where the data model is complex and (rapidly) evolving
>
> **[Learn more: https://www.w3.org/2013/data/](https://www.w3.org/2013/data/)**
To ensure ease of use, the Helmholtz-KG will be based on a lightweight and internationally adopted interoperabiliy architecture based on schema.org semantics and JSON-LD serialisation [2]. This architecture widely used by data producers - including public, private, and governmental data systems - to link and expose scattered, diverse digital assets. By reusing this architecture, unHIDE will ensure that the Helmholtz-KG is able to natively interoperate with global systems.
### Modular design & Extensibility
While the foundation of the Helmholtz-KG will reuse standard web architectural elements and proven, globally adopted conventions, the KG itself is modular by nature: Graphs can be merged, split, independently managed, and readily interfaced with other digital resources without compromising core integrity and functionality. In this manner, Helmholtz data scientists and engineers will be able to propose and test extensions to the graph with minimal overhead, which wll support the ability to extend into existing and well-established systems in the HGF.
This modularity (especially the ability to securely and independently manage parts of the overall graph) will also allow to realize different modes of access to digital assets (e.g. respecting sensitivity and confidentiality but also permitting full openness). The initial implementation of the Helmholtz-KG will not contend with sensitive or confidential data, but such capacities (e.g. user management, license recognition across (meta)data holdings, and authentication) can be explored and implemented when the core technology and operational procedures are stabilised.
The backbone architecture of the Helmholtz Knowledge Graph will be licensed under [CC0/CCBY](https://creativecommons.org/about/cclicenses/) to enable crosswalks to the outside world and gain visibility as e.g. a sub-cloud of the [Linked Open Data Cloud](https://lod-cloud.net/).
### Inspiration
The implementation the Helmholtz-KG architecture is inspired by the federation of stakeholders in IOC-UNESCO's Ocean Data and Information System (ODIS), interconnected by the [ODIS Architecture](https://book.oceaninfohub.org/) [2], and rendered into a knowledge graph federating over 50 partners across the globe by the Ocean InfoHub Project (OIH). Personnel from the HMC's Earth and Environment Hub chair the ODIS federation and lead the technical implementation of OIH, offering direct alignment with unHIDE.
<img src="https://s3.desy.de/hackmd/uploads/upload_c3ba77674d5c58417c6df0f195b0c4ac.png" alt="unHIDE logo" height = "75">
# unHIDE - unified Helmholtz Information and data exchange
```{tableofcontents}
```
## Introduction & Scope
Research across the Helmholtz Association (HGF) depends and thrives on a complex network of inter- and multidisciplinary collaborations which spans across its 18 Centres and beyond.
However, the (meta)data generated through the HGF's research and operations is typically siloed within institutional infrastructure and often within individual teams. The result is that the wealth of the HGF's (meta)data is stored and maintained in a scattered manner, and cannot be used to its full value to scientists, managers, stratgists, and policy makers.
To address this challenge, the Helmholtz Metadata Collaboration (HMC) is launching the **unified Helmholtz Information and Data Exchange (unHIDE)**. This initiative seeks to create a lightweight and sustainable interoperability layer to interlink data infrastructures and provide greater, cross-organisational access to the HGF's (meta)data and information assets. Using proven and globally adopted knowledge graph technology (Box 1), unHIDE will develop a comprehensive association-wide Knowledge Graph (KG) the "Helmholtz-KG": a solution to connect (meta)data, information, and knowledge.
> *Box 1*
>
> What is a Knowledge Graph?
> - A "graph", from graph theory, is a structure that models pairwise connections between objects using "nodes" connected by "edges".
> - A "knowledge graph" uses such a graph structure to capture knowledge about how a collection of things (represented as nodes) relate to one another (via edges). This helps organisations keep track of their collective knowledge, especially in complex and rapidly changing scenarios.
> - Social networks are perhaps the best known graphs, that store knowledge about who knows whom and how, what their interests are, what groups they belong to, and what content they create and interact with.
With the implementation of the Helmholtz-KG, unHIDE will create substantial additinal value for the Helmholtz digital ecosystem and its interconnectivity:
**With the development of the Helmholtz-KG, unHide will:**
- increase discoverability and actionability of HGF data across the whole Association*
- motivate enhancement of (meta)data quality [1] and interoperability
- provide overviews and diagnositcs of the HGF dataspace and digital assets
- allow for traceable and reproducible recovery of (meta)data to enhance research
- support connectivity of HGF data to interact with global infrastructures and projects
- act as a central access and distribution point for stakeholders within and beyond the HGF
## Project Authors
- Jens Bröder [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-4366-3088](https://orcid.org/0000-0001-7939-226X)
- Pier Luigi Buttigieg [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-4366-3088](https://orcid.org/0000-0002-4366-3088)
- Volker Hofmann [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-5149-603X](https://orcid.org/0000-0002-5149-603X)
## Contributors and Partners
[<img style="vertical-align: middle;" alt="FZJ" src='https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png' width=20% height=20%>](https://fz-juelich.de)
## Acknowledgements
[<img style="vertical-align: middle;" alt="HMC Logo" src='https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png' width=50% height=50%>](https://helmholtz-metadaten.de)
This project was developed and funded by the Helmholtz Metadata Collaboration
(HMC), an incubator-platform of the Helmholtz Association within the framework of the
Information and Data Science strategic initiative.
## References
- [1] https://5stardata.info/en/
- [2] https://www.w3.org/TR/2014/REC-json-ld-20140116/
- [3] https://book.oceaninfohub.org/publishing/publishing.html
- [4] https://www.researchobject.org/ro-crate/
- [6] https://www.openarchives.org/pmh/
```{bibliography}
```
%% Cell type:markdown id: tags:
# Content with notebooks
You can also create content with Jupyter Notebooks. This means that you can include
code blocks and their outputs in your book.
## Markdown + notebooks
As it is markdown, you can embed images, HTML, etc into your posts!
![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)
You can also $add_{math}$ and
$$
math^{blocks}
$$
or
$$
\begin{aligned}
\mbox{mean} la_{tex} \\ \\
math blocks
\end{aligned}
$$
But make sure you \$Escape \$your \$dollar signs \$you want to keep!
## MyST markdown
MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check
out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),
or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).
## Code blocks and outputs
Jupyter Book will also embed your code blocks and output in your book.
For example, here's some sample Matplotlib code:
%% Cell type:code id: tags:
``` python
from matplotlib import rcParams, cycler
import matplotlib.pyplot as plt
import numpy as np
plt.ion()
```
%% Cell type:code id: tags:
``` python
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 10
data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]
data = np.array(data).T
cmap = plt.cm.coolwarm
rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))
from matplotlib.lines import Line2D
custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),
Line2D([0], [0], color=cmap(.5), lw=4),
Line2D([0], [0], color=cmap(1.), lw=4)]
fig, ax = plt.subplots(figsize=(10, 5))
lines = ax.plot(data)
ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);
```
%% Cell type:markdown id: tags:
There is a lot more that you can do with outputs (such as including interactive outputs)
with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)
---
---
jupyter-book
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment