Optimising case detection within UK electronic health records: use of multiple linked databases for detecting liver injury.

Wing, K. ; Bhaskaran, K. ; Smeeth, L. ; van Staa, T.P. ; Klungel, O.H. ; Reynolds, R.F. ; Douglas, I. ;
Optimising case detection within UK electronic health records: use of multiple linked databases for detecting liver injury.
BMJ Open, 2016; 6(9):e012102

We aimed to create a ‘multidatabase’ algorithm for identification of cholestatic liver injury using multiple linked UK databases, before (1) assessing the improvement in case ascertainment compared to using a single database and (2) developing a new single-database case-definition algorithm, validated against the multidatabase algorithm.

Method development for case ascertainment.

Three UK population-based electronic health record databases: the UK Clinical Practice Research Datalink (CPRD), the UK Hospital Episodes Statistics (HES) database and the UK Office of National Statistics (ONS) mortality database.

16 040 people over the age of 18 years with linked CPRD-HES records indicating potential cholestatic liver injury between 1 January 2000 and 1 January 2013.

(1) The number of cases of cholestatic liver injury detected by the multidatabase algorithm. (2) The relative contribution of each data source to multidatabase case status. (3) The ability of the new single-database algorithm to discriminate multidatabase algorithm case status.

Within the multidatabase case identification algorithm, 4033 of 16 040 potential cases (25%) were identified as definite cases based on CPRD data. HES data allowed possible cases to be discriminated from unlikely cases (947 of 16 040, 6%), but only facilitated identification of 1 definite case. ONS data did not contribute to case definition. The new single-database (CPRD-only) algorithm had a very good ability to discriminate multidatabase case status (area under the receiver operator characteristic curve 0.95).

CPRD-HES-ONS linkage confers minimal improvement in cholestatic liver injury case ascertainment compared to using CPRD data alone, and a multidatabase algorithm provides little additional information for validation of a CPRD-only algorithm. The availability of laboratory test results within CPRD but not HES means that algorithms based on CPRD-HES-linked data may not always be merited for studies of liver injury, or for other outcomes relying primarily on laboratory test results.