Skip to content

⚙️ Backend Issue — Extend /sample/create_with_file_upload to Support Additional File Formats

🗣️ User Feedback

"XML may not be the best format to use going forward. Other formats (e.g., JSON) can provide more accessible and complete information from additional repositories."


🎯 Goal

Enhance the backend endpoint

POST /sample/create_with_file_upload

to support additional file formats beyond XML, using the existing Connexion + OpenAPI setup.

Currently, it handles only DataCite XML uploads. The goal is to extend it to accept and process:

  • JSON
  • CSV
  • Excel (.xlsx)
  • YAML
  • TOML

📄 Endpoint Specification (Updated)

/sample/create_with_file_upload:
  post:
    operationId: "sample.create_with_upload_file"
    tags:
      - Sample
    summary: "Create a sample record in SEPIA by file upload"
    parameters:
      - $ref: "#/components/parameters/session_id_qp"
    requestBody:
      x-body-name: data
      description: "Sample to create by file upload (XML, JSON, CSV, Excel, YAML, or TOML)"
      required: true
      content:
        multipart/form-data:
          schema:
            $ref: "#/components/schemas/Sample_with_file_upload"
    responses:
      '201':
        description: Sample created successfully in SEPIA by file upload
      '400':
        description: Bad request — invalid or unsupported file format
      '403':
        description: Invalid credentials or insufficient access
      '500':
        description: Server-side error (please inform the administrator)

🧩 Implementation Details

Tasks

  • Update the existing /sample/create_with_file_upload endpoint (replacing /sample/create_with_xml_upload).
  • Update the OpenAPI spec to include the new accepted formats in the request body description.
  • Implement file type detection (by extension or MIME type).
  • Implement parsing and validation logic for each supported format.
  • Map parsed data to the existing internal Sample schema.
  • Return consistent error messages for unsupported or malformed input files.

Format Parsing Suggestions

Format Library Notes
XML lxml Already supported
JSON json Direct structure mapping
CSV pandas.read_csv or csv Convert to dicts
Excel (.xlsx) openpyxl Tabular data
YAML PyYAML Configuration-style input
TOML tomllib (Python ≥3.11) / toml Structured metadata

📅 Proposed Rollout Plan

Phase Format(s) Notes
Phase 1 JSON Highest user demand, easiest integration
Phase 2 CSV, Excel Common tabular formats
Phase 3 YAML, TOML Developer/config-friendly formats

🏷️ Labels

backend feature enhancement data-import priority::high


📘 References

Edited by Mojeeb Rahman Sedeqi