HD4DP v2 csv upload

Last updated: 2022-01-04 13:25

Introduction

This page explains the functioning of the CSVUploader feature. The CSVUploader feature is aimed to do a bulk upload of records : by filling a csv file, one record per row represents one submission so a user can fill as much records as needed.

Architecture

The CSVUploader is located under hd-connect/csvuploader. It both uses hd-connect-csvuploader and hd-connect-proxy modules.

The CSVUploader overall architecture is explained in the sequence diagram below.

3rd party libraries & frameworks

  • Apache Camel : https://camel.apache.org/
  • Spring Boot :

Testing and functioning

  • The CSVUploader creates at root level (SFTP for end user, or hd-all for developer) a folder that contains a subfolder per existing organizations.
  • In each organization folder, a folder per DCD is created.
  • To test the CSVUploader, the tester has to put a csv test file in the appropriate folder, regarding the organization and the concerned DCD.
  • The CSVUploader will do a polling with a delay of 1 min, process the csv file and then create 3 folders:
    • ARCHIVE folder: contains the source csv file
    • RESULTS folder: contains the results of the csv file processing. This file recalls the specified data, and the final status of the processing : Success or Error. If an error occurred, the error message is displayed. For several uploads, the result is appended each time at the end of the result file.
    • ERROR folder : this folder is created if the csv test file hasn't been parsed, due to a I/O error (file corrupted, not found etc ...). So for now, only technical errors are catched and the source csv file is moved to that folder instead of the ARCHIVE folder. At terms, this folder should contains every result which is an error. RESULT folder should contain only results that ends with a SUCCESS status.

Sample test files are available :

  • dwhTestDCD.csv
  • eHealthTestDCD_with_repeatables.csv
  • eHealthTestDCD.csv

A test file is defined with this structure :

  • first row : the columns of the DCD. Each column corresponds to a field in the DCD
  • row 2 -> N : one record per line. each cell contains the value of the current record regarding the header of the column.

Formats

Some formats are specific :

  • Dates : should be dd/mm/yyyy
  • Boolean : true / false
  • Codes : the value of the code (not the translation)
  • Multi codes : there is only one column per field. So when a select box is set as multiple, values have to be separated by a "|". e.g. : 68452|68453|68454
  • Repeatables blocks : in some DCDs, a complete block of fields is repeatable. In that case, value have to be separated by a ";".
    • e.g. : A block is containing 3 fields : A (Lob), B (Type klep) and C (Aantal kleppen).

The block is repeated once by clicking on "Add another" button. In the CSV, there is still one column for each field.

If for the first block, values are :

  • A -> 68545 (=RLL)
  • B -> 13245 (=38101000053)
  • C -> 1

and for the second block, values are :

  • A -> 68548 (=LLL)
  • B -> 13245 (=38101000053)
  • C -> 1

In the CSV file it will result in the following :

  • Column A : 68545;68548
  • Column B : 13245;13245
  • Column C : 1;1

It is possible to mix multi select values and repeatables blocks (if a multi select box is inside a block component that could be repeated). This will end as :

If for the first block, values are :

  • A -> 68545|68944|68946
  • B -> 1
  • C -> 2

and for the second block, values are :

  • A -> 78945|78950
  • B -> 3
  • C -> 4

In the CSV file it will result in the following :

  • Column A : 68545|68944|68946;78945|78950
  • Column B : 1;3
  • Column C : 2;4