Count of unique study subjects

Last updated: 2023-08-22 16:44

Introduction

The purpose of the data table described below is to provide an up-to-date and end-to-end view on the operational dataflow of their project. The obtained metrics on the number of study records and unique study subjects consolidated on a daily base in the HD4RES2 application aims to fulfill this requirement.

Data table: V_HDM_REC_SUBJ_CNT

The researchers have access to a view on the number of study records and unique study subject related to their project. Those figures are available in the data table V_HDM_REC_SUBJ_CNT in HD4RES2 application.

In SAS Enterprise Guide, the data table is accessible via the ‘Servers’ window, located at the bottom left.

First open the following path Servers >> SASAppRES >> Libraries >> HD4RES2

Then, open the following table "V_HDM_REC_SUBJ_CNT"

This data table contains the following fields:

RequestedFieldNameDescription
HDBP_codeCD_PROJ_BUSHD internal business project code (HDBPxxxx)
HDBP_nameTX_PROJ_BUS_NAMEProject business name
HDBP_abbreviationTX_PROJ_BUS_ABBREVProject business abbreviation
HCO_NIDHICodeTX_DATA_PROV_VALNIDHI data provider code
HCO_nameTX_DATA_PROV_NAMENIDHI data provider name
HD4DP_MASTER_CHILD(not yet implemented)HD4DP user type (technically) of the Health Care Organisation:
'Master' = Data provider has his own instance of HD4DP2
'Child' = Data provider uses the HD4DP instance of a Master
HD4DP_MASTER_HCO_NIDHICode (not yet implemented) Master NIDHI data provider code
HD4DP_MASTER_HCO_name (not yet implemented) Master NIDHI data provider name
HD_ARCH(not yet implemented)
HD_ARCH_componentTX_ARCH_COMPONENTDatabase component/DWH layer
HD4DP_record_status (not yet implemented)
Count_DateD_COUNTDatepart of the metrics calculation made at the end of the day
Count (records)MS_RECSCount of the total number of study records
Count_New (records)MS_NEW_RECSNew number of study records as from previous date and time
Count (subjects)MS_UNQ_SUBJCount of unique study subject/patient
Count_New (subjects)MS_NEW_UNQ_SUBJNew number of study subject/patient as from previous date and time

Reference table: V_HDM_MAP_TBL_PROJ

As business projects/data collections often led to the creation of several data tables the figures are resulting from the consolidation of the most representative business project data tables. Consequently those data tables must be treated with respect to their own functional and structural properties.

To this end, researchers may consult an additional data table V_HDM_MAP_TBL_PROJ related to their project on the HD4RES2 application which describes the content and rules applied on each data table by business project.

In SAS Enterprise Guide, the data table is accessible via the ‘Servers’ window, located at the bottom left.

First open the following path Servers >> SASAppRES >> Libraries >> HD4RES2

Then, open the following table "V_HDM_MAP_TBL_PROJ"

FieldNameDescriptionUsed for dimension
TX_SCHEMA_NAMSchema location of the data table HD_ARCH_component
TX_TBL_NAMTable name of the data table
TX_REG_NAMHD internal DWH register nameHDBP_code
CD_PROJ_BUSHD Project business code (HDBPxxxx) HDBP_code
TX_PROJ_BUS_NAMEHD Project business code name
TX_PROJ_BUS_ABBREVHD Project business abbreviation
TX_SQL_DP_TPE Field or SQL statement used to identify the type of the data providerHCO_NIDHICode
TX_SQL_DP_VALField or SQL statement used to identify the data providerHCO_NIDHICode
TX_SQL_SUBJField or SQL statement used to identify the study subject
FL_INCL_METRIC_CNT_REC_PATFlag which indicates whether the table contributes toward the calculation of the metrics:
1, 2 => included in the figures calculation
0 => explicitly excluded from the calculation
null => implicitly excluded from the calculation
DT_VALID_FROMTimestamp as from when above properties are applicable
DT_VALID_TOTimestamp as to when above properties are discarded

Calculation of the study records:

One study record is defined as one record in a data table.

Therefore the delivered figures corresponds to the sum of number of records of each data tables as used as defined in the reference table V_HDM_MAP_TBL_PROJ. One must understand that the figures have no other purpose than to provide a synthetic technical count on the number of records available by provided dimensions.
The figures have by no means any scientific value and shouldn't be used in any context other then the metrics project.

Calculation example code:
A hypothetical Business project (HDBP000n) is made of three data tables, two stored in the STG layer and one in the CDW layer. The total study records are calculated as follow on the latest available data :

select 'HDBP000n' as HDBP_code,HD_ARCH_component,HCO_NIHDI_code,
sum(total_table_records) as total_study_records
from (
select 'STG' as HD_ARCH_component, assigned-rules as HCO_NIHDI_code,
count(*) as total_table_records
from STG.datatable1
group by HD_ARCH_component,HCO_NIHDI_code
union all
select 'STG' as HD_ARCH_component, assigned-rules HCO_NIHDI_code,
count(*) as total_table_records
from STG.datatable2
group by HD_ARCH_component,HCO_NIHDI_code
union all
select 'CDW' as HD_ARCH_component, assigned-rules as HCO_NIHDI_code,
count(*) as total_table_records
from CDW.datatable3
group by HD_ARCH_component,HCO_NIHDI_code
) as hdbp
group by HDBP_Code,HD_Arch_component,HCO_NIHDI_code

Calculation of the unique study subjects:

One study subject is defined as the entity for which data has been collected. Depending the data collection most of the time a study subject represent the patient.

Therefore the delivered figures corresponds to the sum of all uniquely identified subjects over the complete set of data tables by provided dimensions. One must understand that the figures have no other purpose than to provide a synthetic technical count on the number of subjects available at the level - and only at the level - of the provided dimensions. All summarization over less dimensions combination are erroneous and Healthdata.be neither DWH-team can be held responsible for any misuse of the obtained figures.
The figures have by no means any scientific value and shouldn't be used in any context other then the metrics project.

Calculation example code:
A hypothetical business project is made of three data tables, two in STG and one in CDW. In all three tables the patient field has been identified as IDC_PAT. The total unique study subject are calculated as follow on the latest available data :

select 'HDBP000n' as HDBP_code,HD_ARCH_component,HCO_NIHDI_code,
count(distinct study_subject) as total_unq_subject
from (
select distinct 'STG' as HD_ARCH_component, assigned-rules as HCO_NIHDI_code,
IDC_PAT as study_subject
from STG.datatable1
union
select distinct 'STG' as HD_ARCH_component, assigned-rules HCO_NIHDI_code,
IDC_PAT as study_subject
from STG.datatable2
union
select distinct 'CDW' as HD_ARCH_component, assigned-rules as HCO_NIHDI_code,
IDC_PAT as study_subject
from CDW.datatable3
) as hdbp
group by HDBP_Code,HD_Arch_component,HCO_NIHDI_code

This data table is currently in development by the healthdata.be team.

This documentation is being updated regularly. We try to provide as correct, complete and clear as possible information on these pages. Nevertheless, if you see anything in the documentation that is not correct, does not match your experience or requires further clarification, please create a request (type : request for information) via our portal (https://sciensano.service-now.com/sp) or send us an e-mail via support.healthdata@sciensano.be to report this documentation issue. Please, do not forget to mention the URL or web address of the page with the documentation issue. We will then adjust the documentation as soon as possible. Thank you!