Add first version of jupyter-book for documenation and first CI

ae02d33d · Jens Bröder · 37818195 · ae02d33d · ae02d33d · ae02d33d
Commit ae02d33d authored 2 years ago by Jens Bröder
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
+
+stages:
+  - build
+  - deploy
+
+pages:
+  stage: deploy
+  image: busybox:latest
+  script:
+    - mv _build/html public
+  artifacts:
+    paths:
+      - public
+  rules:
+    - if: $CI_COMMIT_BRANCH == "main"
+  environment: production
+
+build:
+  stage: build
+  image: python:latest
+  script:
+    - pip install -U jupyter-book
+    - jupyter-book clean docs
+    - jupyter-book build docs
+  artifacts:
+    paths:
+      - _build/
+  rules:
+    - if: $CI_COMMIT_BRANCH != "main"
--- a/docs/README.md
+++ b/docs/README.md
+This folder contains the all files of the documentation pages. 
+They are created using jupyter-book and hasted with gitlab pages (example see https://gitlab.com/pages/jupyterbook).
+
+
+Therefore, you can build the docs with:
+```
+jupyter-book build docs
+```
+
+Also ideally the terms should go somewhere else and be automatically included in this.
+The same goes for code documenation of the pipelines and their usage.
+
+
--- a/docs/_config.yml
+++ b/docs/_config.yml
+#######################################################################################
+# A default configuration that will be loaded for all jupyter books
+# Users are expected to override these values in their own `_config.yml` file.
+# This is also the "master list" of all allowed keys and values.
+
+#######################################################################################
+# Book settings
+title                       : The unified Helmholtz Information and Data Exchange (UnHIDE) Project # The title of the book. Will be placed in the left navbar.
+author                      : The Helmholtz Metadata Collaboration (HMC)  # The author of the book
+copyright                   : "2022"  # Copyright year to be placed in the footer
+logo                        : images/unhide_logo.png  # A path to the book logo
+# Patterns to skip when building the book. Can be glob-style (e.g. "*skip.ipynb")
+exclude_patterns            : [_build, Thumbs.db, .DS_Store, "**.ipynb_checkpoints"]
+# Auto-exclude files not in the toc
+only_build_toc_files        : false
+
+#######################################################################################
+# Execution settings
+execute:
+  execute_notebooks         : auto  # Whether to execute notebooks at build time. Must be one of ("auto", "force", "cache", "off")
+  cache                     : ""    # A path to the jupyter cache that will be used to store execution artifacts. Defaults to `_build/.jupyter_cache/`
+  exclude_patterns          : []    # A list of patterns to *skip* in execution (e.g. a notebook that takes a really long time)
+  timeout                   : 30    # The maximum time (in seconds) each notebook cell is allowed to run.
+  run_in_temp               : false # If `True`, then a temporary directory will be created and used as the command working directory (cwd),
+                                    # otherwise the notebook's parent directory will be the cwd.
+  allow_errors              : false # If `False`, when a code cell raises an error the execution is stopped, otherwise all cells are always run.
+  stderr_output             : show  # One of 'show', 'remove', 'remove-warn', 'warn', 'error', 'severe'
+
+#######################################################################################
+# Parse and render settings
+parse:
+  myst_enable_extensions:  # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html
+    # - amsmath
+    - colon_fence
+    # - deflist
+    - dollarmath
+    # - html_admonition
+    # - html_image
+    - linkify
+    # - replacements
+    # - smartquotes
+    - substitution
+    - tasklist
+  myst_url_schemes: [mailto, http, https] # URI schemes that will be recognised as external URLs in Markdown links
+  myst_dmath_double_inline: true  # Allow display math ($$) within an inline context
+
+#######################################################################################
+# HTML-specific settings
+html:
+  favicon                   : images/favicon.png  # A path to a favicon image
+  use_edit_page_button      : false  # Whether to add an "edit this page" button to pages. If `true`, repository information in repository: must be filled in
+  use_repository_button     : true  # Whether to add a link to your repository button
+  use_issues_button         : true  # Whether to add an "open an issue" button
+  use_multitoc_numbering    : true   # Continuous numbering across parts/chapters
+  extra_navbar              : Powered by <a href="https://jupyterbook.org">Jupyter Book</a>  # Will be displayed underneath the left navbar.
+  extra_footer              : ""  # Will be displayed underneath the footer.
+  google_analytics_id       : ""  # A GA id that can be used to track book views.
+  home_page_in_navbar       : true  # Whether to include your home page in the left Navigation Bar
+  baseurl                   : ""  # The base URL where your book will be hosted. Used for creating image previews and social links. e.g.: https://mypage.com/mybook/
+  comments:
+    hypothesis              : false
+    utterances              : false
+  announcement              : "" # A banner announcement at the top of the site.
+
+#######################################################################################
+# LaTeX-specific settings
+latex:
+  latex_engine              : pdflatex  # one of 'pdflatex', 'xelatex' (recommended for unicode), 'luatex', 'platex', 'uplatex'
+  use_jupyterbook_latex     : true # use sphinx-jupyterbook-latex for pdf builds as default
+
+#######################################################################################
+# Launch button settings
+launch_buttons:
+  notebook_interface        : classic  # The interface interactive links will activate ["classic", "jupyterlab"]
+  binderhub_url             : https://mybinder.org  # The URL of the BinderHub (e.g., https://mybinder.org)
+  jupyterhub_url            : ""  # The URL of the JupyterHub (e.g., https://datahub.berkeley.edu)
+  thebe                     : false  # Add a thebe button to pages (requires the repository to run on Binder)
+  colab_url                 : "" # The URL of Google Colab (https://colab.research.google.com)
+
+repository:
+  url                       : https://codebase.helmholtz.cloud/hmc/hmc-public/unhide/documentation  # The URL to your book's repository
+  path_to_book              : docs  # A path to your book's folder, relative to the repository root.
+  branch                    : main  # Which branch of the repository should be used when creating links
+
+#######################################################################################
+# Advanced and power-user settings
+sphinx:
+  extra_extensions          :   # A list of extra extensions to load by Sphinx (added to those already used by JB).
+  local_extensions          :   # A list of local extensions to load by sphinx specified by "name: path" items
+  recursive_update          : false # A boolean indicating whether to overwrite the Sphinx config (true) or recursively update (false)
+  config                    :   # key-value pairs to directly over-ride the Sphinx configuration
--- a/docs/_toc.yml
+++ b/docs/_toc.yml
+# Table of contents
+# Learn more at https://jupyterbook.org/customize/toc.html
+
+format: jb-book
+root: intro
+chapters:
+- file: implementation
+- file: data_sources
--- a/docs/data_sources.md
+++ b/docs/data_sources.md
+## Data Sources
+
+#### Initial Scope
+Initial efforts of the Helmholtz-KG implementation will focus on the representation of (meta)data describing the following digital core assets: 
+- Documents / Publications
+- published Datasets
+- Software
+- Institutions
+- Infrastructure & Ressources
+- Researchers & Experts
+- Projects
+
+The representation of these instances will semantically alligned with the [schema.org](https://schema.org/docs/full.html) vocabulary, a globally adopted standard offering a relaxed frame for the representation of heterogeneous data. Following the initial implementation the semantic expresivness of the graph can be increased by integrating domain ontologies such as the HMC developed [Helmholtz Digitization Ontology](https://codebase.helmholtz.cloud/hmc/hmc-public/hob/hdo) (HDO), which provides precise and comprehensive semantics of the concepts and practices used to manage digital assets.
+
+#### Data Ingestion Process
+The Helmholtz-KG will offer multiple options for existing and emerging HGF infrastructures, data providers, and communities to declare their resources and digital assets in the graph for discoverability. We will prioritise the recommended publishing process for structured data on the web (as used by ODIS/OIH and many others): data providers would either 1) provide a sitemap or robots.txt file which will direct harvesting software to a collection of JSON-LD/schema.org documents or 2) expose JSON-LD snippets in the document head element of a web resource (i.e. HTML document). Both approaches are described in the publisher documentation of the Ocean InfoHub Project [3].
+
+Alternative publication patterns may include -- HTTP-accessible [RO-Crate](https://www.researchobject.org/ro-crate/) [4] metadata in `ro-crate-metadata.json`, the exposure of structured metadata records via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [5] or properly documented RESTful APIs in general [6]. We will explore the need and feasibility of alternate publishing and harvesting modes during the course of unHIDE.
+
+HMC personnel will support the onboarding of data providers as well as the implementation of custom (meta)data pipelines / connectors and mapping to RDF / JSON-LD, if necessary and where appropriate.
+
+#### Potential Data Providers 
+Within the HGF a number of relevant web-based data architectures exist. These will be targeted by unHIDE to collaborate on building interfaces to the Helmholtz-KG.
+
+The initial implementation fill focus on: 
+- HGF institutional (data) repositories
+- Central Libraries of Helmholtz Research Centers
+- Domain-specific (data) repositories relevant to HGF
+- Helmholtz GitLab Instances
+
+Subsequent efforts will include further ressources such as:
+- Helmholtz FAIR digital objects (via HMC)
+- Helmholtz Ontology Base (HOB) (via HMC)
+- The Helmholtz [software directory](https://helmholtz.software/) (centrally maintained by HIFIS)
+- [Helmholtz Data Challenges Platform](https://helmholtz-data-challenges.de/)
+- other ressources of the Helmholtz Metadata Collaboration (HMC) 
+- Content management systems (CMS)
+- Helmholtz Computing centers (e.g. JSC)
+- Helmholtz Federated IT Services (HIFIS)
+- Helmholtz instruments and sensor databases (e.g. @GFZ, DEPAS @AWI, RDMInfoPool, etc.)
+- Helmholtz Scientific Project Workflow Platform (HELIPORT)
+- [Helmholtz Imaging Modalities database](https://modalities.helmholtz-imaging.de/)
+- Laboratory information management systems (LIMS)
+- Helmholtz Open Science Office
+
--- a/docs/images/favicon.png
+++ b/docs/images/favicon.png
--- a/docs/images/unhide_logo.png
+++ b/docs/images/unhide_logo.png
--- a/docs/implementation.md
+++ b/docs/implementation.md
+## Architecture & Implementation
+
+### Foundational architecture
+
+The Helmholtz Knowledge Graph (Helmholtz-KG) aims to enhance the HGF's digital capacities, transparency, and productivity through dissemination and implementation of Linked Data principles (Box 2). Thus, unHIDE will build the Helmholtz-KG on mature web architecture and state-of-the-art semantic web technologies. This will ensure reliability and compatibility with global systems, while also exploring innovative approaches to maximise the Helmholtz-KG's ability to accelerate research and operations. 
+
+> Box 2
+>
+> Graph data is:
+> - Open-world, allowing resilient operations with novel or unexpected data flows
+> - Faster than using SQL and associated JOIN operations
+> - Better suited to integrating data from heterogeneous sources
+> - Better suited to situations where the data model is complex and (rapidly) evolving
+> 
+> **[Learn more: https://www.w3.org/2013/data/](https://www.w3.org/2013/data/)**
+
+To ensure ease of use, the Helmholtz-KG will be based on a lightweight and internationally adopted interoperabiliy architecture based on schema.org semantics and JSON-LD serialisation [2]. This architecture widely used by data producers - including public, private, and governmental data systems - to link and expose scattered, diverse digital assets. By reusing this architecture, unHIDE will ensure that the Helmholtz-KG is able to natively interoperate with global systems.
+
+### Modular design & Extensibility
+While the foundation of the Helmholtz-KG will reuse standard web architectural elements and proven, globally adopted conventions, the KG itself is modular by nature: Graphs can be merged, split, independently managed, and readily interfaced with other digital resources without compromising core integrity and functionality. In this manner, Helmholtz data scientists and engineers will be able to propose and test extensions to the graph with minimal overhead, which wll support the ability to extend into existing and well-established systems in the HGF.
+
+This modularity (especially the ability to securely and independently manage parts of the overall graph) will also allow to realize different modes of access to digital assets (e.g. respecting sensitivity and confidentiality but also permitting full openness). The initial implementation of the Helmholtz-KG will not contend with sensitive or confidential data, but such capacities (e.g. user management, license recognition across (meta)data holdings, and authentication) can be explored and implemented when the core technology and operational procedures are stabilised. 
+
+The backbone architecture of the Helmholtz Knowledge Graph will be licensed under [CC0/CCBY](https://creativecommons.org/about/cclicenses/) to enable crosswalks to the outside world and gain visibility as e.g. a sub-cloud of the [Linked Open Data Cloud](https://lod-cloud.net/).
+
+### Inspiration
+
+The implementation the Helmholtz-KG architecture is inspired by the federation of stakeholders in IOC-UNESCO's Ocean Data and Information System (ODIS), interconnected by the [ODIS Architecture](https://book.oceaninfohub.org/) [2], and rendered into a knowledge graph federating over 50 partners across the globe by the Ocean InfoHub Project (OIH). Personnel from the HMC's Earth and Environment Hub chair the ODIS federation and lead the technical implementation of OIH, offering direct alignment with unHIDE.
+
--- a/docs/intro.md
+++ b/docs/intro.md
+<img src="https://s3.desy.de/hackmd/uploads/upload_c3ba77674d5c58417c6df0f195b0c4ac.png" alt="unHIDE logo" height = "75">
+
+
+
+# unHIDE - unified Helmholtz Information and data exchange
+
+```{tableofcontents}
+```
+
+## Introduction & Scope
+
+Research across the Helmholtz Association (HGF) depends and thrives on a complex network of inter- and multidisciplinary collaborations which spans across its 18 Centres and beyond. 
+
+However, the (meta)data generated through the HGF's research and operations is typically siloed within institutional infrastructure and often within individual teams. The result is that the wealth of the HGF's (meta)data is stored and maintained in a scattered manner, and cannot be used to its full value to scientists, managers, stratgists, and policy makers. 
+
+To address this challenge, the Helmholtz Metadata Collaboration (HMC) is launching the **unified Helmholtz Information and Data Exchange (unHIDE)**. This initiative seeks to create a lightweight and sustainable interoperability layer to interlink data infrastructures and provide greater, cross-organisational access to the HGF's (meta)data and information assets. Using proven and globally adopted knowledge graph technology (Box 1), unHIDE will develop a comprehensive association-wide Knowledge Graph (KG) the "Helmholtz-KG": a solution to connect (meta)data, information, and knowledge. 
+
+> *Box 1*
+> 
+> What is a Knowledge Graph?
+> - A "graph", from graph theory, is a structure that models pairwise connections between objects using "nodes" connected by "edges".
+> - A "knowledge graph" uses such a graph structure to capture knowledge about how a collection of things (represented as nodes) relate to one another (via edges). This helps organisations keep track of their collective knowledge, especially in complex and rapidly changing scenarios.
+> - Social networks are perhaps the best known graphs, that store knowledge about who knows whom and how, what their interests are, what groups they belong to, and what content they create and interact with.
+
+With the implementation of the Helmholtz-KG, unHIDE will create substantial additinal value for the Helmholtz digital ecosystem and its interconnectivity:
+
+**With the development of the Helmholtz-KG, unHide will:** 
+- increase discoverability and actionability of HGF data across the whole Association* 
+- motivate enhancement of (meta)data quality [1] and interoperability
+- provide overviews and diagnositcs of the HGF dataspace and digital assets
+- allow for traceable and reproducible recovery of (meta)data to enhance research
+- support connectivity of HGF data to interact with global infrastructures and projects
+- act as a central access and distribution point for stakeholders within and beyond the HGF
+
+## Project Authors
+- Jens Bröder [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-4366-3088](https://orcid.org/0000-0001-7939-226X)
+- Pier Luigi Buttigieg [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-4366-3088](https://orcid.org/0000-0002-4366-3088)
+- Volker Hofmann [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/06/ORCID_iD.svg/2048px-ORCID_iD.svg.png" alt="ORCID Logo" height ="20"> 0000-0002-5149-603X](https://orcid.org/0000-0002-5149-603X)
+
+## Contributors and Partners
+
+
+[<img style="vertical-align: middle;" alt="FZJ" src='https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/FZJ/FZJ.png' width=20% height=20%>](https://fz-juelich.de)
+
+
+## Acknowledgements
+
+
+[<img style="vertical-align: middle;" alt="HMC Logo" src='https://github.com/Materials-Data-Science-and-Informatics/Logos/raw/main/HMC/HMC_Logo_M.png' width=50% height=50%>](https://helmholtz-metadaten.de)
+
+This project was developed and funded by the Helmholtz Metadata Collaboration
+(HMC), an incubator-platform of the Helmholtz Association within the framework of the
+Information and Data Science strategic initiative.
+
+
+## References
+- [1] https://5stardata.info/en/
+- [2] https://www.w3.org/TR/2014/REC-json-ld-20140116/
+- [3] https://book.oceaninfohub.org/publishing/publishing.html
+- [4] https://www.researchobject.org/ro-crate/
+- [6] https://www.openarchives.org/pmh/
+
+```{bibliography}
+```
+
+
--- a/docs/notebooks.ipynb
+++ b/docs/notebooks.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Content with notebooks\n",
+    "\n",
+    "You can also create content with Jupyter Notebooks. This means that you can include\n",
+    "code blocks and their outputs in your book.\n",
+    "\n",
+    "## Markdown + notebooks\n",
+    "\n",
+    "As it is markdown, you can embed images, HTML, etc into your posts!\n",
+    "\n",
+    "![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)\n",
+    "\n",
+    "You can also $add_{math}$ and\n",
+    "\n",
+    "$$\n",
+    "math^{blocks}\n",
+    "$$\n",
+    "\n",
+    "or\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\mbox{mean} la_{tex} \\\\ \\\\\n",
+    "math blocks\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "But make sure you \\$Escape \\$your \\$dollar signs \\$you want to keep!\n",
+    "\n",
+    "## MyST markdown\n",
+    "\n",
+    "MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check\n",
+    "out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),\n",
+    "or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).\n",
+    "\n",
+    "## Code blocks and outputs\n",
+    "\n",
+    "Jupyter Book will also embed your code blocks and output in your book.\n",
+    "For example, here's some sample Matplotlib code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from matplotlib import rcParams, cycler\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "plt.ion()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fixing random state for reproducibility\n",
+    "np.random.seed(19680801)\n",
+    "\n",
+    "N = 10\n",
+    "data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]\n",
+    "data = np.array(data).T\n",
+    "cmap = plt.cm.coolwarm\n",
+    "rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))\n",
+    "\n",
+    "\n",
+    "from matplotlib.lines import Line2D\n",
+    "custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(.5), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(1.), lw=4)]\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "lines = ax.plot(data)\n",
+    "ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is a lot more that you can do with outputs (such as including interactive outputs)\n",
+    "with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "state": {},
+    "version_major": 2,
+    "version_minor": 0
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
+%% Cell type:markdown id: tags:
+
+# Content with notebooks
+
+You can also create content with Jupyter Notebooks. This means that you can include
+code blocks and their outputs in your book.
+
+## Markdown + notebooks
+
+As it is markdown, you can embed images, HTML, etc into your posts!
+
+![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)
+
+You can also $add_{math}$ and
+
+$$
+math^{blocks}
+$$
+
+or
+
+$$
+\begin{aligned}
+\mbox{mean} la_{tex} \\ \\
+math blocks
+\end{aligned}
+$$
+
+But make sure you \$Escape \$your \$dollar signs \$you want to keep!
+
+## MyST markdown
+
+MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check
+out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),
+or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).
+
+## Code blocks and outputs
+
+Jupyter Book will also embed your code blocks and output in your book.
+For example, here's some sample Matplotlib code:
+
+%% Cell type:code id: tags:
+
+``` python
+from matplotlib import rcParams, cycler
+import matplotlib.pyplot as plt
+import numpy as np
+plt.ion()
+```
+
+%% Cell type:code id: tags:
+
+``` python
+# Fixing random state for reproducibility
+np.random.seed(19680801)
+
+N = 10
+data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]
+data = np.array(data).T
+cmap = plt.cm.coolwarm
+rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))
+
+
+from matplotlib.lines import Line2D
+custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),
+                Line2D([0], [0], color=cmap(.5), lw=4),
+                Line2D([0], [0], color=cmap(1.), lw=4)]
+
+fig, ax = plt.subplots(figsize=(10, 5))
+lines = ax.plot(data)
+ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);
+```
+
+%% Cell type:markdown id: tags:
+
+There is a lot more that you can do with outputs (such as including interactive outputs)
+with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)
--- a/docs/references.bib
+++ b/docs/references.bib
+---
+---
+
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
+jupyter-book