CsvParser skips all observation values before handing them over to datastore
As described here at !20 (merged) the CsvParser skips all observation values before handing them over to the datastore when a CSV line contains an empty value. Its because processing is now line wise and the NanNotAllowedHereError
exceptions are caught likewise.
Steps to reproduce the behavior:
- Start development environment with the docker-compose from this repo:
docker-compose up -d
- Get CRNS example file, one is pinned to the ZID mattermost channel, for example:
https://mm.ufz.de/wkdv/pl/bcdi5cs4k3rkxmwiutf5a9q3dw
- Run the parser (replace
<path_to_your_local_crns_demo_file>
before)
docker-compose run -v <path_to_your_local_crns_demo_file>:/tmp/CRS01_Data2109010000.566_001441.txt --rm app main.py parse -p CsvParser -t postgresql://postgres:postgres@db/postgres -s file:///tmp/CRS01_Data2109010000.566_001441.txt -d ce2b4fb6-d9de-11eb-a236-125e5a40a845
- Check contents of the database:
The database is empty when you have a fresh one (docker-compose down -v
before otherwise):
docker-compose run --rm db bash -c 'echo -e "select s.name, o.result_time, o.result_number from thing t join datastream s on s.thing_id = t.id join observation o on s.id = o.datastream_id where t.uuid = \047ce2b4fb6-d9de-11eb-a236-125e5a40a845\047;" | psql postgres://postgres:postgres@db/postgres'
- Proof, that the setup is working in principle
Run parser:
docker-compose run -v /home/abbrent/Downloads/CRS01_Data210901
0000.566_001441.txt:/tmp/CRS01_Data2109010000.566_001441.txt --rm app main.py parse -p AnotherCustomParser -t po
stgresql://postgres:postgres@db/postgres -s file:///tmp/CRS01_Data2109010000.566_001441.txt -d ce2b4fb6-d9de-11e
b-a236-125e5a40a845
Creating tsm-extractor_app_run ... done
Connecting to sqlalchemy supported database "postgresql://postgres:postgres@db/postgres"
Successfully connected sqlalchemy to "postgresql://postgres:postgres@db/postgres"
Fetched remote raw data file from "file:///tmp/CRS01_Data2109010000.566_001441.txt". Size: 4.69 KB
Parsing raw data [####################################] 40/40Pushed 40 new observations to database.
Close database session.
😁
Check database:
docker-compose run --rm db bash -c 'echo -e "select s.name, o.result_time, o.result_number from thing t join datastream s on s.thing_id = t.id join observation o on s.id = o.datastream_id where t.uuid = \047ce2b4fb6-d9de-
11eb-a236-125e5a40a845\047;" | psql postgres://postgres:postgres@db/postgres'
Creating tsm-extractor_db_run ... done
name | result_time | result_number
------------------+-------------------------------+---------------
MySecondThing/0 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/0 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/1 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/1 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/2 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/2 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/3 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/3 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/4 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/4 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/5 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/5 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/6 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/6 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/7 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/7 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/8 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/8 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/9 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/9 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/10 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/10 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/11 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/11 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/12 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/12 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/13 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/13 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/14 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/14 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/15 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/15 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/16 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/16 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/17 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/17 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/18 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/18 | 2021-12-23 14:03:28.718268+00 | 23
MySecondThing/19 | 2021-12-23 14:03:28.656192+00 | 23
MySecondThing/19 | 2021-12-23 14:03:28.718268+00 | 23
(40 rows)
Edited by Luca Johannes Nendel