First Iteration of Quality Assessment module
Description
Rather than re-inventing the wheel we have integrated the SaQC python package into neptoon. This allows us to focus on implementation instead of re-defining functions that have been well constructed elswhere.
Related Issue
Closes #13 (closed)
Proposed Changes
- The proposed structure for QA in neptoon has been developed
- The QualityCheck abstract class has defined
- Examples of QualityCheck types have been created
- DataAuditLog works with QualityChecks
- The QualityAssessmentFlagBuilder supports a way to stage checks on a per column basis
- The DataQualityAssessor is a standalone module that takes a dataframe as an argument.
- DataQualityAssessor has functions for building a QualityAssessmentFlagBuilder or attaching a premade one
- It has a method which then also runs and returns a DataFrame with flag values
Testing
-
[Describe the testing steps or scenarios to verify the changes] -
[Testing step 2] -
[Testing step 3]
Checklist
-
I have tested the changes thoroughly -
I have updated the relevant documentation -
I have added necessary comments and explanations in the code -
I have considered backward compatibility, if applicable -
I have reviewed the code changes myself
Additional Notes
Still to implement:
- The CRNSDataHub needs a method for masking the crns_data_frame with the flagged dataframe
- Implement more check options
Edited by Daniel Power