Commit 2c33e349 authored by Timm Schoening's avatar Timm Schoening
Browse files

removed notebooks and moved them to separate repo

parent 4b530b2b
data:
base_paths: [/volumes/project/]
base_paths_remote: [/volumes/project/]
use_gear_folders: false
equipment:
CAM:
- {eqid: ADD CAM EQUIPMENT HERE}
PFM:
- {eqid: ADD PFM EQUIPMENT HERE}
images:
artist: Holothurian Impact team
copyright: '(c) International Ocean Research. Contact: press@foobar.de'
credit: Holothurian Impact & Dr. Jane Doe
description: 'Acquired by camera ___DEPLOYMENT:CAMERAID___ mounted on platform ___DEPLOYMENT:PLATFORM___
during cruise ___CRUISE:NUMBER___ (station: ___DEPLOYMENT:STATION___). Navigation
data were automatically edited by the MarIQT software (removal of outliers, smoothed
and splined to fill time gaps) and linked to the image data by timestamp.'
editor: John Doe
license: CC-BY
pfdo: {acquisition: photo, deployment: survey, illumination: artificial light, image-quality: raw,
navigation: beacon, resolution: mm, scale-reference: laser marker, spectral-resolution: rgb,
zone: seafloor}
navigation_data:
processing_parameters:
DEFAULT:
- {name: source, value: DSHIP}
- {name: beacon_id, value: 2}
- {name: max_vertical_speed, unit: m/s, value: 3.0}
- {name: max_lateral_speed, unit: m/s, value: 2.0}
- {name: max_time_gap, unit: s, value: 300}
- {name: smoothing_gauss_half_width, unit: s, value: 60}
- {name: outlier_check_min_neighbors, unit: number, value: 5}
- {name: max_allowed_outlier_lateral_dist, unit: m, value: 10}
- {name: max_allowed_outlier_vertical_dist, unit: m, value: 10}
- {name: outlier_check_time_window_size, unit: s, value: 60}
MUC:
- {name: processing_type, value: station}
- {name: beacon_id, value: 1}
ROV:
- {name: processing_type, value: transect}
- {name: beacon_id, value: 4}
sources:
DSHIP:
dship_all_device_operations_file: /Users/tschoening/dev/repos/mariqt-test/files/PRJ23_all-device-operations.dat
dship_all_underwater_navigation_file: /Users/tschoening/dev/repos/mariqt-test/files/PRJ23_all-underwater-navigation.dat
data_frequency_seconds: 5
date_format: '%Y/%m/%d %H:%M:%S'
dship_event_navigation_folder: /Users/tschoening/dev/repos/mariqt-test/files/dship_zips/
dship_user_mail: jdoe1@foobar.de
dship_user_name: JohnDoe
max_depth: 6000
satellite_navigation: {sensor_equipment_id: ADD_EQUIPMENT_ID_HERE}
underwater_navigation: {sensor_equipment_id: ADD_EQUIPMENT_ID_HERE}
FIXED: {latitude: 0.0, longitude: 0.0}
project:
acronym: Holothurian Impact
copyright: '(c) International Ocean Research. Contact: press@foobar.de'
data-pi: {affiliation: International Ocean Research, email: jdoe1@foobar.de, name: John
Doe, orcid: 9876-5432-1000-0000}
end: '2020-05-27 08:00:00'
funding: Funding for this project was provided by the International Funding Agency
(1234ABCD987)
info: {de: Ein deutscher Text mit ca. 1000 Zeichen der das Projekt beschreibt.,
en: 'An english text of ca. 1000 characters length, describing the project.'}
license: CC-BY
number: PRJ23
pi: {affiliation: International Ocean Research, email: jdoe@foobar.de, name: Dr.
Jane Doe, orcid: 0000-0001-2345-6789}
start: '2019-02-15 06:00:00'
title: Assessing the impacts of holothurian harvesting.
%% Cell type:markdown id:demonstrated-experiment tags:
# Curation Overview
This notebook provides an overview of the curation process in your data folders. It stores results in the `../files/<project>_curation-cache.yaml` cache files so that subsequent runs of this notebook will be faster. You can clear the cache and rescan everything by setting the `rescan` variable in the next cell to `True` and then running the notebook.
%% Cell type:code id:designed-sandwich tags:
``` python
rescan = False
```
%% Cell type:code id:sustained-fluid tags:
``` python
#################################################################################################################
### You should not see - and not modify (!) - this cell, unless you are sure what you are doing! Just run it. ###
#################################################################################################################
import mariqt.processing.files as miqtpf
cfg = miqtpf.cfgFileLoadProjectDefault()
import os
import mariqt.core as miqtc
# Check base_paths where data resides
all_good = True
for bp in cfg['data']['base_paths']:
if not os.path.exists(bp):
all_good = False
print("Issue: Base path",bp,"not found")
elif not os.path.isdir(bp):
all_good = False
print("Issue: Base path",bp,"points to a file but we require a directory.")
elif len(os.listdir(bp)) == 0:
all_good = False
print("Issue: No data available in base path",bp)
if all_good:
print("It looks like your base path settings are good. There is data. Lets continue to start curating.")
# Check curation paths
if "DSHIP" in cfg['navigation_data']['sources']:
miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_device_operations_file'))
miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_underwater_navigation_file'))
miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_event_navigation_data_folder'))
print("All is good.")
```
%% Output
Issue: Base path /volumes/project/ not found
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-3a63bfd7ab9c> in <module>
25 # Check curation paths
26 if "DSHIP" in cfg['navigation_data']['sources']:
---> 27 miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_device_operations_file'))
28 miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_underwater_navigation_file'))
29 miqtc.assertExists(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_event_navigation_data_folder'))
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mariqt/core.py in assertExists(path)
7 def assertExists(path):
8 if not os.path.exists(path):
----> 9 raise NameError("Could not find: " + path)
10
11 ### Asserts that a path string to a directory ends with a slash
NameError: Could not find: /Users/tschoening/dev/repos/mariqt-test/files/PRJ23_all-device-operations.dat
%% Cell type:code id:classified-welcome tags:
``` python
#################################################################################################################
### You should not see - and not modify (!) - this cell, unless you are sure what you are doing! Just run it. ###
#################################################################################################################
import copy
import yaml
import datetime
# Get list of events, expected to be formatted like so:
# events[device_operation] = {'code':<device acronym>,'actions':[{'action':<action>,'lat':<latitude>,'lon':<longitude>,'dep':<depth>,'utc':<timestamp>,...],'start':<start timestamp>}
if "DSHIP" in cfg['navigation_data']['sources']:
import mariqt.sources.dship as miqtsd
events = miqtsd.parseDSHIPDeviceOperationsOrEventsFile(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_device_operations_file'))
miqtsd.removeEventsByOtherCruises(events,cfg['project']['number'])
miqtsd.renameEvents(events)
else:
raise Exception("Can only process DSHIP events by now. Sorry.")
# These are the status information fields that will be collected for each event in the following
one_event_status = {"changed":False,"dir_exists":False,"event_exists":False,"doi":"","num_actions":0,"num_sensors":0,"has_gps_nav_raw":False,"has_gps_nav_cur":False,"has_usbl_nav_raw":False,"has_usbl_nav_cur":False,"has_protocol":False,
"raw_data_vol":0,"raw_data_num":0,"cur_data_vol":0,"cur_data_num":0,"prt_data_vol":0,"prt_data_num":0,"prd_data_vol":0,"prd_data_num":0,"ext_data_vol":0,"ext_data_num":0,
"has_images":False}
# How to map the path names of the folder convention to the short names used here in the script
path_to_key = {"external":"ext","raw":"raw","protocol":"prt","products":"prd","processed":"cur"}
# Check whether a cache file exists and shall be loaded
if os.path.exists("../files/" + cfg['project']['number']+"_curation-cache.yaml") and not rescan:
with open("../files/" + cfg['project']['number']+"_curation-cache.yaml","r") as yaml_file:
cache = yaml.safe_load(yaml_file)
all_event_status = cache['events']
print("Showing cached status from ",cache['date_created'])
cache_unix = datetime.datetime.strptime(cache['date_created']+"+0000","%Y-%m-%d %H:%M:%S.%f%z").timestamp()
else:
rescan = True
# Find all events
all_event_status = {}
for event in events:
if event not in all_event_status:
all_event_status[event] = copy.deepcopy(one_event_status)
all_event_status[event]['event_exists'] = True
all_event_status[event]['num_actions'] = len(events[event]['actions'])
# Browse all the data base_paths folders and look for event subfolders
event_folders = {}
for path in cfg['data']['base_paths']:
tmp_events_folders = os.listdir(path)
for tmp_event in tmp_event_folders:
if not tmp_event.startswith('.') and os.path.isdir(path+tmp_event):
if not tmp_event in event_folders:
event_folders[tmp_event] = [path]
else:
events_folders[tmp_event].append(path)
if os.path.getmtime(path+tmp_event) > cache_unix:
all_event_status[tmp_event]['changed'] = True
# Did we find events that are not known in the event files we opened earlier?
if tmp_event not in all_event_status:
all_event_status[tmp_event] = copy.deepcopy(one_event_status)
all_event_status[tmp_event]['changed'] = True
else:
all_event_status[event]['dir_exists'] = True
if rescan:
import mariqt.definitions as miqtd
satellite_navigation_sensor = miqtpf.cfgValue(cfg,['navigation:sources:DSHIP:satellite_navigation:sensor_equipment_id'])
underwater_navigation_sensor = miqtpf.cfgValue(cfg,['navigation:sources:DSHIP:underwater_navigation:sensor_equipment_id'])
for event in event_folders:
# Find sensors for event
event_sensors = []
for base_folder in event_folders[event]:
tmp_sensors = os.listdir(base_folder+event)
for tmp_sensor in tmp_sensors:
if tmp_sensor[0] != "." and tmp_sensor not in event_sensors:
if tmp_sensor == "protocol":
all_event_status[event]['has_protocol'] += (len([f for f in os.listdir(base_folder+event+"/protocol") if not f.startswith('.')]) > 0)
else:
event_sensors.append(tmp_sensor)
all_event_status[event]['num_sensors'] += len(event_sensors)
# Iterate through all sensors and fetch file information
for sensor in event_sensors:
data_volume = {}
for sub in path_to_key:
data_volume[path_to_key[sub]+"_data_num"] = 0
data_volume[path_to_key[sub]+"_data_vol"] = 0
for base_folder in event_folders[event]:
for sub in path_to_key:
tmp = miqtpf.recursiveFileStat(base_folder+event+"/"+sensor+"/"+sub+"/")
data_volume[path_to_key[sub]+"_data_num"] += tmp['num']
data_volume[path_to_key[sub]+"_data_vol"] += tmp['size']
all_event_status[event][path_to_key[sub]+"_data_num"] += tmp['num']
all_event_status[event][path_to_key[sub]+"_data_vol"] += tmp['size']
tmp = miqtpf.recursiveFileStat(base_folder+event+"/"+sensor+"/raw/",miqtd.image_types)
if tmp['num'] > 0:
all_event_status[event]['has_images'] = True
if sensor == satellite_navigation_sensor:
all_event_status[event]['has_gps_nav_raw'] = data_volume["raw_data_num"] > 0 or all_event_status[event]['has_gps_nav_raw']
all_event_status[event]['has_gps_nav_cur'] = data_volume["cur_data_num"] > 0 or all_event_status[event]['has_gps_nav_cur']
elif sensor == underwater_navigation_sensor:
all_event_status[event]['has_usbl_nav_raw'] = data_volume["raw_data_num"] > 0 or all_event_status[event]['has_usbl_nav_raw']
all_event_status[event]['has_usbl_nav_cur'] = data_volume["cur_data_num"] > 0 or all_event_status[event]['has_usbl_nav_cur']
with open("../files/" + cfg['project']['number']+"_curation-cache.yaml","w") as yaml_file:
yaml.dump({'date_created':datetime.datetime.now(),'events':all_event_status[event]},yaml_file)
```
%% Output
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-8-39fd363a0933> in <module>
10 # events[device_operation] = {'code':<device acronym>,'actions':[{'action':<action>,'lat':<latitude>,'lon':<longitude>,'dep':<depth>,'utc':<timestamp>,...],'start':<start timestamp>}
11 if "DSHIP" in cfg['navigation_data']['sources']:
---> 12 import mariqt.sources.dship as miqtsd
13
14 events = miqtsd.parseDSHIPDeviceOperationsOrEventsFile(miqtpf.cfgValue(cfg,'navigation_data:sources:DSHIP:dship_all_device_operations_file'))
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mariqt/sources/dship.py in <module>
6
7 import mariqt.geo as miqtg
----> 8 import marqit.source.dship_settings
9
10 def addEndToDSHIPEventsByLastActionBeforeNextEvent(dship_events):
ModuleNotFoundError: No module named 'marqit'
%% Cell type:code id:polar-circus tags:
``` python
#################################################################################################################
### You should not see - and not modify (!) - this cell, unless you are sure what you are doing! Just run it. ###
#################################################################################################################
import pandas as pd
pd.set_option('display.max_rows', None)
def color_false_red(val):
color = 'red' if val == False or val == "" or val == "0" else 'black'
return 'color: %s' % color
print_copy = copy.deepcopy(all_event_status)
for event in print_copy:
for key in path_to_key:
print_copy[event][path_to_key[key]+"_data_vol"] = gmrcc.helper.humanReadable(print_copy[event][path_to_key[key]+"_data_vol"])
df = pd.DataFrame(print_copy).T
df.style.applymap(color_false_red)
```