Skip to content Skip to sidebar Skip to footer

Peer Review Thresholds Risk Assessment in Research Trial

  • Research
  • Open Access
  • Published:

Evaluating the re-identification run a risk of a clinical study report anonymized nether EMA Policy 0070 and Health Canada Regulations

  • 4441 Accesses

  • 7 Citations

  • 28 Altmetric

  • Metrics details

Abstract

Groundwork

Regulatory agencies, such every bit the European Medicines Agency and Health Canada, are requiring the public sharing of clinical trial reports that are used to brand drug blessing decisions. Both agencies have provided guidance for the quantitative anonymization of these clinical reports before they are shared. There is express empirical information on the effectiveness of this arroyo in protecting patient privacy for clinical trial data.

Methods

In this paper we empirically examination the hypothesis that when these guidelines are implemented in practice, they provide adequate privacy protection to patients. An anonymized clinical report written report for a trial on a non-steroidal anti-inflammatory drug that is sold as a prescription eye drop was subjected to re-identification. The target was 500 patients in the U.s.. Simply suspected matches to real identities were reported.

Results

Half-dozen suspected matches with low confidence scores were identified. Each suspected friction match took 24.2 h of effort. Social media and expiry records provided the most useful information for getting the suspected matches.

Conclusions

These results advise that the anonymization guidance from these agencies can provide adequate privacy protection for patients, and the modes of attack can inform further refinements of the methodologies they recommend in their guidance for manufacturers.

Peer Review reports

Introduction

There is growing recognition within the research community that the re-analysis of clinical trial data tin provide new insights compared to the original publications [1]. Evidence from voluntary data-sharing efforts that have been running over the last few years suggest that the validation of the primary endpoint is an uncommon objective of secondary analysis of clinical trial data, and that the most mutual purposes for secondary analyses are new analyses of the treatment event and the affliction state [ii].

Clinical trial information mean 2 different things. Beginning, there are the structured individual-level participant data and, second, there are the clinical reports. Clinical reports would normally follow ICH guidance M4 for Clinical Technical Documents [three], and module 5 of these documents is the clinical study report (CSR), which would commonly follow ICH guidance E3 [four].

Regulators at the European Medicines Bureau (EMA) have issued Policy 0070 requiring the release of clinical reports [5]. When a manufacturer applies for a centralized marketing authorization at the EMA, the Committee for Medicinal Products for Human being Utilise (CHMP) provides the (positive or negative) recommendation to the European Commission (EC). The EC grants or refuses the marketing potency in a centralized procedure. The anonymized clinical reports are then published online after the EC conclusion, Footnote one or afterwards the CHMP decision if there is no EC conclusion. A future phase of Policy 0070 is expected to address the release of individual participant data, just at the time of writing no date has been set for this.

Similarly, Health Canada'southward Public Release of Clinical Information (PRCI) initiative [half-dozen] that went into effect in 2019 requires the release of anonymized clinical reports after a final regulatory conclusion is made. All the same, information technology also includes requests from the public for legacy clinical reports within its scope. The anonymized documents are published on the Wellness Canada portal. Footnote two

The secondary analysis of clinical reports has produced informative inquiry results, including those on drug safety, evaluating bias, replication of studies, and meta-analysis [7].

Anonymization is necessary before clinical reports are released by these agencies on their portals because they can incorporate a substantial amount of personal health data. For case, at that place volition be detailed information about participant medical histories, and narratives documenting agin events as well as whatever relevant information needed to interpret these adverse events. These documents besides contain summary tabular data (due east.g., vital statistics and counts). It is known that personal data can exist derived from tabular information and there is a body of work on assessing the re-identification risk and the anonymization of tabular data [viii,ix,10].

Appropriate anonymization protects participant privacy, and also limits the liability of the manufacturer and the regulatory agencies when this information is fabricated broadly available. Furthermore, using information that has been released publicly but that may not have been anonymized adequately by the sponsor tin can also impose legal risks on the users of that information [11], such as researchers and journalists.

The EMA has published guidance for manufacturers to anonymize their documents before submitting them to the Agency under Policy 0070 [12]. Health Canada'due south anonymization guidance follows the same principles as the EMA'due south [6], and they will take documents already anonymized according to the EMA's recommended methodology.

In this context, anonymization means ensuring that the probability of correctly assigning an identity to a participant described in the clinical reports is very small. This is also referred to as the probability of re-identification. Both the EMA and Health Canada have ready an acceptable probability threshold at 0.09.

The EMA anonymization guidance recommends a run a risk-based approach to anonymization, and allows for ii approaches: a quantitative approach and a qualitative approach. The erstwhile entails using statistical disclosure control techniques to estimate the actual probability of re-identification (e.g., run across [8,9,ten, xiii,14,15,16]). A qualitative approach as has been applied in practice does non approximate probabilities, but uses qualifiers as low/medium/high risk. The adventure level is determined using criteria such as the number of participants, whether the trial is in a rare disease, subjective assessment of potential socioeconomic harm to patients if at that place is re-identification, and the perceived re-identification gamble of certain pieces of information (whether they would exist knowable by potential adversaries).

To date, 61% of dossiers published under Policy 0070 have followed a qualitative approach and ten% a quantitative approach [17] (the rest did not require anonymization, such equally systematic reviews). There is no mostly accepted methodology for qualitative anonymization of clinical trial information and therefore each of the manufacturers that has published dossiers on the EMA portal using a qualitative approach has developed their own methodology. In addition to questions about the validity of custom home-grown methodologies, in that location is the applied challenge of pooling information across trials if they are anonymized differently. On the other mitt, there is a large body of literature on quantitative anonymization. Health Canada is emphasizing the demand for quantitative methods for re-identification risk measurement and anonymization [6].

One concern that has arguably been contributing to the slow adoption past manufacturers of quantitative anonymization approaches described in the EMA and Health Canada guidance is dubiety on whether they are sufficiently privacy protective [18]. The purpose of the current study was therefore to empirically examination whether the quantitative anonymization approach for CSRs described in the EMA and Health Canada guidance is sufficiently privacy protective [18]. This study makes two contributions: it is the get-go empirical evaluation of re-identification take a chance for a CSR, and it is the first empirical examination of the hypothesis that the EMA and Wellness Canada anonymization guidance provides adequate privacy protection.

Our empirical evaluation of re-identification hazard follows a UK methodology described by the Data Commissioner's Office (ICO) [19], the Office of National Statistics (ONS) [20], and the Great britain Anonymisation Network [21]. Furthermore, in the context of deciding whether information is personal, a tribunal judge recently used the success of such an empirical re-identification evaluation as the primary criterion [22]. Our study is consistent with previous empirical re-identification run a risk evaluation studies in that it focuses on data subjects in a single dataset [23,24,25,26,27,28,29,30,31,32,33,34,35]. In Boosted file 1, we review previous work in this surface area.

This newspaper is structured as follows. We kickoff describe the specific trial that was the target of the empirical test and the methods that were used to re-identify data subjects, including the metrics collected about the success rate and attempt. This is followed by the results, limitations, and conclusions.

Methods

Study design

The basic design of this study involves taking an anonymized CSR (following the approaches described by the EMA and Wellness Canada) and then subjecting it to re-identification past attempting to match the participants in the CSR with individuals in the real world. The group performing the anonymization was independent from the group performing the re-identification. This section describes the CSR, the matching, and how it was evaluated.

The clinical report report

The CSR that was the subject of the empirical re-identification test pertained to a clinical trial of nepafenac. Nepafenac is a non-steroidal anti-inflammatory drug that is sold as a prescription eye driblet under two main trade names, which differ on the basis of drug concentration—the 0.3% interruption is marketed as Ilevro, while a 0.i% suspension is marketed every bit Nevanac. Nevanac received FDA approval in August 2005, while Ilevro was approved in Oct 2012.

The trial in question (C-12-067) was a randomized, double-masked, controlled report to assess the condom and efficacy of the nepafenac ophthalmic suspension (0.iii%) for improvements in clinical outcomes amid diabetic subjects post-obit cataract surgery. The trial was sponsored by Alcon Enquiry, Ltd, currently a partitioning of Novartis Europharm Ltd.

The trial ran from 26 March 2013 to thirteen May 2015 in 66 centers in the USA, Latin America, and the Caribbean. Subjects must have been 18 years of age or older, must have had a cataract, and must take been planning to undergo cataract extraction by phacoemulsification. Subjects must too have had a history of diabetes and diabetic retinopathy. At that place were 615 subjects randomized and 598 were included in the primary efficacy analysis. The distribution of subjects past country is presented in Tabular array 1.

Table 1 Distribution by country of nepafenac trial participants

Full size table

This trial was selected because it has a large number of participants in the USA. Most re-identification studies accept been performed on Us data subjects [23], arguably because there are more than data available nigh them to make such attempts more likely to succeed. Furthermore, an anonymized version of this particular CSR had been submitted to the EMA under their Policy 0070. This meant that the anonymization team had experience working with it and were able to update the anonymization applied for the electric current study using recent methodological advances (e.g., past using active learning methods to improve data extraction for the detection of personally identifying data [36]).

Anonymization of the CSR

The CSR was anonymized following the EMA Policy 0070 guidelines [12], and the anonymized document was made available for our study. The anonymization performed is also consistent with the Wellness Canada PRCI guidelines [6]. The anonymization was performed by a team from Privacy Analytics Inc. in Canada. The general quantitative anonymization methodology has been described in detail elsewhere [13, 37,38,39,forty].

Specifically, a hybrid approach for information extraction consisting of a rule-based engine [41] and an active learning organisation [36] was used to extract subject identifiers and quasi-identifiers (e.1000., dates, participant demographics, medical history, and serious adverse events) from the CSRs. All subject identifiers were pseudonymized. A sample of pages were also manually annotated past two independent annotators—evidence shows that the accurateness diminishes with more than two identifiers [42]. The transmission annotations were used to create a gold standard from which recall (the proportion of identifiers detected correctly) was computed. The probability of re-identification was measured using a thou-anonymity computer [43]. A risk model for unstructured text was and then used to estimate the overall chance of re-identification taking into account the call up and the chiliad-anonymity results [44]. If the upper 95% confidence limit of the estimated risk was larger than the EMA and Wellness Canada recommended threshold of 0.09, then transformations were performed on the quasi-identifiers until the estimated upper chance limit was at or below 0.09. The transformations performed were generalization and suppression.

Suspected matches vs. verified matches

During a re-identification endeavor, there is outset a suspected match with a existent identity which would and so demand to be verified to ensure that it is a correct lucifer. An example of a verification in the current study could be if the pharmaceutical visitor (manufacturer) had the correct identity of the patient and was able to confirm whether a suspected friction match was correct. Counting only the suspected matches will give quite conservative results, in that the counts will overestimate the match charge per unit. In practice, in that location is a sharp drop in the friction match rate between the suspected results and the verified results. The summary presented in Table two provides the success rates of verifications from previous studies.

Tabular array ii Rate of correct verification from suspected matches

Full size table

In the context of clinical trials, manufacturers simply get fundamental coded data from the trial sites and not names and addresses of participants. Because the manufacturer does not know the identity of the participants, it is non possible for the manufacturer to directly verify whether a suspected match is correct or non.

Under such conditions there are three approaches that can be used to obtain or estimate a count of the verified matches:

  1. 1.

    The manufacturer can get through each private site and have them verify each suspected match many years later on the trial has been completed.

  2. 2.

    Assign a confidence score to appraise the likelihood that the suspected match was correct. A ordinarily used confidence scale is a number from one to 5, with 5 indicating loftier confidence in the lucifer equally illustrated in Table 3 [30]. The confidence percentages and qualitative significant were based on the bodily subjective scores and terms used by analysts performing re-identification in previous studies. Therefore, they are grounded in the manner in which analysts express themselves with respect to suspected matches. In the tabular array we also interpret the meaning of each of the five levels into low/medium/high confidence in a suspected friction match. The confidence score has been found to be correlated with the definiteness of the lucifer after verification [30].

  3. 3.

    Using the rates from literature presented in Table 2, compute the weighted mean of suspected matches that are verified matches and utilise that equally an adjustment. This value is 23% (i.e., 23% of suspected matches are verified). Although it is not clear whether the suspected matches here were only the high confidence ones (i.e., verifications in these studies were but attempted on loftier confidence suspected matches), making this at all-time some other ceiling estimate.

Table three Estimation of the conviction levels attached to candidate matches [30]

Full size table

For the current study we used the 2nd arroyo (confidence scores).

Third party

The re-identification study was performed by an independent 3rd party who was not involved in any mode in the anonymization of the data itself, namely a squad from Good Research in the USA.

The third party performing the re-identification did non convey the suspected re-identifications back to the study sponsor nor to the grouping that performed the anonymization. Only the quantitative and summary results were communicated back—the same results as presented in this newspaper.

Target subjects

The re-identification study was performed only on the 500 participants based in the United states. In that location are three reasons for using the US patients as the target participants:

  1. 1.

    As noted earlier, most known re-identification attacks take been performed in the USA (run into [23]) considering in that location is more than public information available about the population, making such attacks easier. Arguably, then, the results from US participants would represent the ceiling success charge per unit.

  2. ii.

    For applied reasons, the study needed to exist performed on patients living in an English-speaking country.

  3. 3.

    The largest geography in this trial was the Usa, providing a larger sample of target individuals.

Methods used

The terminology used in the Great britain to depict the deputed re-identification attack on a dataset is a motivated intruder test [19, 20]. Nosotros volition likewise use that terminology here to be consistent with the literature.

The ICO guidance notes that the motivated intruder should not have specialized noesis [19]. Notwithstanding, at that place was a minimum amount of domain cognition that our investigators needed to have to proceed with the study, for instance, where to start to look for public facing clinical data and how and what kinds of Freedom of Information requests to try.

During risk measurement and anonymization there are two directions for a re-identification assail that need to be considered [13]. An attack can use information from an external source and match that with the information in the CSR (population-to-sample attack). In the context of a motivated intruder test, this could exist a famous person or an acquaintance of the intruder. An attack can also start from the characteristics of a patient in the CSR and attempt to lucifer information technology with person profiles in the external data sources or registries (sample-to-population attack). The registry may be pre-existing or may exist created by conducting searches on the Spider web. Commercial databases would also be considered a kind of registry.

We defined several approaches to re-place individuals in the dataset. In practice, the process was iterative, where partial information of any person known to have participated in the clinical study was gathered from multiple sources. The fractional information was then combined to attempt to re-identify the individuals. Also annotation that these approaches were informed past discussions within the clinical trial disclosure community with respect to methods and sources that were believed to be useful for actual re-identification attacks.

The following were the approaches that were examined past the analysts performing the re-identification attempt:

  1. 1.

    Clinical reports: identifying external clinical reports of agin events in registries and released by regulatory agencies, matching external reports with more data to the anonymized events in the CSR.

  2. 2.

    FDA and EMA Freedom of Information Act (FOIA) requests: requesting records from federal (United states) and European union agencies.

  3. 3.

    Decease records: given that there were five deaths amongst the subjects, matching these to public death records could provide additional identity information.

  4. 4.

    Hospital discharge records: by identifying some of the areas where the study was performed, nosotros may obtain matches from these to the adverse events on the anonymized report.

  5. 5.

    Re-contacting subjects: attempting to recruit the subjects from the same written report again.

  6. six.

    Social media: although the subjects may have been told to not post information from the studies, some individuals may take posted information directly or indirectly related to it, leading united states to partially identify some subjects

  7. 7.

    Voter registration records: equally outlined by Benitez and Malin [46], voter registration records can provide a possible avenue for re-identifying medical records at calibration.

  8. 8.

    Other approaches.

Additional file 2 details the goals, external datasets used, and methods of attack for each of these approaches. According to the ONS guidance, the intruder should spend a few hours to re-identify a tape [20].

The outcome and its estimation

At the cease of a motivated intruder test in that location are two summary numbers that demand to be generated:

  1. (1)

    the percentage of individuals in the dataset that have a suspected match

  2. (ii)

    the effort to discover a suspected match

Each of these will be described further below.

Pct of individuals with a suspected match

The denominator for this calculation is 500. We did non consider incorrect re-identification in the final adding. Although in theory incorrect re-identification can cause the data subjects some impairment, there is no way to actually protect confronting incorrect re-identification short of not sharing any information. An adversary tin can assign random names to records in a database and end up with many incorrect re-identifications. Therefore, we focus but on correct (suspected) re-identification.

Effort to re-identify an individual

With respect to the effort to re-identify an individual, this was calculated as an average across all individuals who were candidates to be re-identified. In this instance, the total try would exist the attempted try for the failed matches too.

Results

Suspected matches

Six subjects were determined to be suspected match candidates, with confidence scores all within the "depression confidence" group. The successful approaches are summarized in Tables 4 and 5. As can be seen, only the search through death records and social media searches identified suspected matches.

Table 4 Approaches used for each of the six suspected matches

Full size table

Table v Summary of re-identification confidence scores

Total size table

Using the approach of matching against death records, iv potential matches were obtained, three of which had a confidence score of 1 and the other a conviction score of 2. The confidence score was determined by expert assessment based on the fit of the friction match, such equally the obituaries and narratives with the records on the anonymized report or other known information based on the study (e.one thousand., being diabetic and having had cataract surgery), historic period, gender, and cause of expiry.

About of the initial hits for Facebook and Reddit keyword searches were discarded due to their low probability of being in the study (not known to exist diabetic) and the lack of specific identifiers to unmarried them out in the CSR. The search keywords used are detailed in Additional file 4. However, it was possible to identify ii subjects for which at that place was some confidence of a match based on use of the drug, surgery, engagement of surgery, suspected or confirmed diabetic status, and boosted information from other clinical/medical visits, although the confidence scores were only i and 2.

The ONS guidance [20] states that "[t]he aim is not to release a dataset with zero risk and so a good result would be if at that place were a small number of [re-identifications] with low confidence." Such a effect indicates that the re-identification is low and that the perturbations practical to the anonymized data were not also extensive. Therefore, the results obtained here are consequent with this balance.

Given that all of the suspected matches had a low confidence score, the likelihood of an attempted verification would exist depression.

Re-identification effort

A full of 170 h was spent on the investigation and the subsequent report (this does not include the effort spent writing the current article). Approaches #three (death records) and #6 (social media) resulted in potential matches, and overall took 49 and 75 h, respectively. Details of the fourth dimension breakdown and process are described in Additional file 3.

The total estimated endeavour per subject was approximately 24 h. This was calculated by aggregating the entire attempt (170 h) excluding 25 h of project direction tasks (e.yard., writing the report, project meetings) for a total of 145 h. This number is divided past the six candidates: 145 h / vi candidates = 24.two h per candidate.

No commercial datasets were purchased nor were any other expenditures incurred for the purpose of the re-identification, therefore no additional costs across labor costs were reported.

Word

Summary

In that location has been proficient progress recently in making the reports from clinical trials publicly bachelor through the EMA and Wellness Canada. Just there take also been concurrent concerns about the protection of participant privacy. Medical histories and narratives in clinical report reports can include very sensitive and detailed data about trial participants. While clinical trial participants are supportive of data sharing as long as adequate safeguards are in place [47], for specific diseases and weather, patients worry about discrimination in employment, reduced access to insurance including health care, and inability to secure loan and credit advances if their sensitive information is identified [48]. In general, it is known that when patients take concerns near the privacy of their personal health information, they adopt privacy protective behaviors, such as non seeking care, hiding information, and visiting multiple providers [49]. If patients worry virtually how their data are used, at that place is the risk that this will affect their willingness to participate in trials. The approach that has been adopted to mitigate this run a risk and enable access to clinical reports is anonymization.

The purpose of this study was to empirically test the hypothesis that a clinical study report that was anonymized co-ordinate to the quantitative methods described by the EMA and Health Canada for the public release of documents provided acceptable privacy protection to participants. The drug in question was nepafenac, which is a non-steroidal anti-inflammatory drug that is sold as a prescription heart drop. The commissioned re-identification focused on the 500 patients who were recruited in the USA, and was performed by an independent third party not involved in the anonymization of the documents and that had no vested interest in the effect.

Overall, there were vi suspected re-identifications with low confidence scores. The scoring scheme has been correlated with the correctness of suspected re-identification in previous work. The estimation of this upshot is that no patients could reasonably be re-identified using the re-identification methods described here.

This attack provides conviction that the quantitative anonymization approaches outlined in the EMA and Health Canada guidance can exist a reasonable arroyo to protect patient privacy. However, information technology should be noted that many sponsors do not use a quantitative approach to anonymize their submissions to the EMA and therefore our conclusions on managing privacy risks exercise non extend to the case where qualitative or other approaches are used. The impact of this anonymization approach on the utility of the anonymized documents is not known as that was non the field of study of this study, but should be examined in future enquiry.

Limitations

An empirical re-identification risk assessment has its ain limitations in that it cannot mimic exactly what an adversary volition do when attacking a dataset. For example, an adversary with criminal intent may do things that nosotros would non in a commissioned re-identification endeavor, for instance, committing criminal acts or buying stolen information to friction match against the CSR. Likewise, every commissioned attack has a budget and time limitations, and that imposes some boundaries on what tin be achieved. Information technology is plausible that an adversary will have more budget than that assumed in the current study. Therefore, there are legal, upstanding, and practical limits to what can be achieved using such an empirical test.

This study was performed on subjects in the USA. Different results may accept been obtained had the motivated intruder examination been on subjects from a different country. Although, arguably, the lucifer rates would exist lower in other countries and therefore our numbers should be considered a ceiling.

It is non known at this point whether a motivated intruder test on subjects in a rare disease trial or a dissimilar therapeutic area would produce similar results. Therefore, nosotros are cautious in generalizing the findings across therapeutic areas and studies investigating small and narrower populations. Furthermore, our assay was performed on a single clinical trial and a single CSR. Circumspection should be exercised when generalizing these findings more than broadly to other trials and CSRs. Our report yet provides some initial evidence as well equally a re-identification methodology specific to clinical trials that tin can be applied in future piece of work.

The EMA and Health Canada guidance documents do not provide operational footstep-past-stride directions on how to perform anonymization—they generally refer to the literature for these details. To the extent that the interpretations of the bureau guidelines are heterogeneous, other similar studies may accomplish different success rates with their re-identification attempts. In item, the team that performed the anonymization on this CSR were quite knowledgeable of the field and methods of anonymization. Since no minimal expertise requirements are stipulated by the regulatory regime, other teams performing the anonymization, even if following the same guidance, may achieve different results.

Conclusions

This study was the first to empirically examination the anonymization methods that have been recommended by the EMA and Wellness Canada to facilitate the sharing of clinical reports more broadly. The results are encouraging in that they demonstrate the robustness of these anonymization methods. Additional empirical tests of re-identification take a chance on anonymized CSRs will accumulate testify on the strengths and weaknesses of the quantitative approach to anonymization that is described by the two agencies.

Information technology is also more often than not recommended that manufacturers regularly perform re-identification studies on their documents and data, especially when they release them in the public domain. Such empirical feedback will help amend the anonymization methods that are used, and can augment the statistical risk estimation models that are typically used to determine the level of perturbation and redaction that needs to be applied.

Availability of information and materials

A version of the nepafenac anonymized CSR is available on the EMA portal (registration required): https://clinicaldata.ema.europa.eu/spider web/cdp/abode

Notes

References

  1. Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA. 2014;312(10):1024–32. https://doi.org/10.1001/jama.2014.9646.

    CAS  Article  PubMed  Google Scholar

  2. Navar AM, Pencina MJ, Rymer JA, Louzao DM, Peterson ED. Apply of open access platforms for clinical trial data. JAMA. 2016;315(12):1283. https://doi.org/10.1001/jama.2016.2374.

    Commodity  PubMed  PubMed Cardinal  Google Scholar

  3. International Quango for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use. Organisation of the mutual technical document for the registration of pharmaceuticals for human use: M4. Geneva: ICH; 2016.

  4. International Briefing on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Man Use. Structure and content of clinical study reports: E3. Geneva: ICH; 1995.

  5. European Medicines Agency. European Medicines Agency policy on publication of information for medicinal products for human use: policy 0070. 2014.

    Google Scholar

  6. Health Canada. Guidance certificate on public release of clinical information. 2019. Available: https://world wide web.canada.ca/en/health-canada/services/drug-health-product-review-approval/contour-public-release-clinical-information-guidance.html. Accessed 4 June 2019.

    Google Scholar

  7. Ferran J-Thousand, Nevitt Due south. European Medicines Agency policy 0070: an exploratory review of data utility in clinical study reports for inquiry. BMC Med Res Methodol. 2019;19(1):204.

    Article  Google Scholar

  8. Hundepool A, et al. Statistical disclosure control. Chichester: Wiley; 2012.

  9. Willenborg L, de Waal T. Statistical disclosure command in practise. New York: Springer-Verlag; 1996.

    Book  Google Scholar

  10. Willenborg Fifty, de Waal T. Elements of statistical disclosure control. New York: Springer-Verlag; 2001.

    Book  Google Scholar

  11. El Emam K, Hintze M. Are there risks of using public clinical trial data nether GDPR? The Privacy Advisor (IAPP); 2018. Available: https://iapp.org/news/a/are-there-risks-of-using-public-clinical-trial-data-nether-gdpr/. Accessed vii Sept 2019.

  12. European Medicines Agency, "External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use (v1.4)," 2018.

    Google Scholar

  13. El Emam K. Guide to the de-identification of personal health data. Auerbach: CRC Printing; 2013.

    Book  Google Scholar

  14. Duncan Grand, Elliot M, Salazar G. Statistical confidentiality—principles and practise. Boca Raton: Springer; 2011.

  15. Matthias Templ. Statistical disclosure control for microdata—methods and applications in R. Available: https://world wide web.springer.com/united states of america/volume/9783319502700. Accessed 24 Aug 2018.

  16. Doyle P, Lane J, Theeuwes J, Zayatz L, editors. Confidentiality, disclosure and information admission: theory and practical applications for statistical agencies. 1st ed. Amsterdam, New York: Elsevier Science; 2001.

    Google Scholar

  17. European Medicines Agency. Clinical data publication in numbers. In: EMA Technical Anonymization Group (TAG) coming together; 2018.

    Google Scholar

  18. Multi-Regional Clinical Trials Center and European Medicines Agency. Data anonymisation—a fundamental enabler for clinical information sharing: workshop written report. London: European Medicines Agency; 2018.

  19. Information Commissioner's Office. Anonymisation: managing data protection take a chance lawmaking of practice. Wilmslow: Information Commissioner's Office; 2012.

  20. https://www.ons.gov.britain/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/guidanceonintrudertesting. Accessed vii Dec 2019.

  21. Elliot M, Mackey E, O'Hara K, Tudor C. Anonymisation decision-making framework. Manchester: UKAN Publications; 2016.

    Google Scholar

  22. Tribunal between John Peters and the Information Commissioner and the University of Bristol before Gauge David Thomas and tribunal members Marion Saunders and Alison Lowton. Starting time-tier Tribunal (General Regulatory Bedchamber)—Information Rights, Appeal Reference: EA/2018/0142, 2019. https://www.casemine.com/judgement/uk/5ccbcb4e2c94e04229a76636. Accessed vii Sept 2019.

  23. El Emam G, Jonker E, Arbuckle L, Malin B. A systematic review of re-identification attacks on health data. PLoS I. 2011;vi(12):e28071.

    Article  Google Scholar

  24. Elliot MJ, Purdam One thousand. The evaluation of gamble from identification attempts. Manchester: University of Manchester; 2003.

  25. Kwok P, Davern M, Hair E, Lafky D. Harder than you think: a case written report of re-identification risk of HIPAA-compliant records. In: JSM proceedings, Miami Beach, FL; 2011.

    Google Scholar

  26. Lafky D. The safe harbor method of de-identification: an empirical test. In: Presented at the fourth national HIPAA Summit Due west, San Francisco, CA; 2009.

    Google Scholar

  27. Elliot M. Using targeted perturbation of microdata to protect confronting intelligent linkage. In: Proceedings of UNECE work session on statistical confidentiality, Manchester, Britain; 2007.

    Google Scholar

  28. Elliot One thousand. Report on the disclosure risk analysis of the supporting people datasets Manchester: Administrative Data Liaison Service; 2011.

  29. Elliot M, Mackey Eastward, O'Shea Due south, Tudor C, Spicer K. End user licence to open government information? A simulated penetration sttack on 2 social survey datasets. J Off Stat. 2016;32(two):329–48. https://doi.org/10.1515/jos-2016-0019.

    Article  Google Scholar

  30. Tudor C, Cornish Chiliad, Spicer K. Intruder testing on the 2011 UK Census: providing practical bear witness for disclosure protection. J Privacy Confidentiality. 2013;five(two):111–32.

    Google Scholar

  31. Spicer K, Tudor C, Cornish One thousand. Intruder testing: demonstrating practical bear witness of disclosure protection in 2011 Uk Census. In: Presented at the UNECE conference of European statisticians, Ottawa, ON; 2013.

    Google Scholar

  32. Gregory M. DECC'due south national energy efficiency data-framework—anonymised dataset; 2014.

    Google Scholar

  33. Ramachandran A, Singh L, Porter E, Nagle F. Exploring re-identification risks in public domains. In: Presented at the 2012 10th annual international conference on privacy, security and trust; 2012. p. 35–42. https://doi.org/10.1109/PST.2012.6297917.

    Chapter  Google Scholar

  34. El Emam K, et al. De-identification methods for open health data: the case of the Heritage Health Prize Claims Dataset. J Med Internet Res. 2012;14(ane):e33. https://doi.org/x.2196/jmir.2001.

    Article  PubMed  PubMed Key  Google Scholar

  35. Narayanan A. An adversarial assay of the reidentifiability of the heritage wellness prize dataset; 2011.

    Google Scholar

  36. Li M, Scaiano M, El Emam Thousand, Malin B. Efficient active learning for electronic medical tape de-identification. AMIA Jt Summits Transl Sci Proc. 2019;2019:462-71.

  37. El Emam Thou, Arbuckle L. Anonymizing health information: case studies and methods to get you lot started. Sabastopol: O'Reilly; 2013.

  38. Dankar F, El Emam K, Neisa A, Roffey T. Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak. 2012;12:66.

    Article  Google Scholar

  39. El Emam Chiliad, Paton D, Dankar F, Koru G. De-identifying a public use microdata file from the Canadian national discharge abstract database. BMC Med Inform Decis Mak. 2011;xi:53.

    Article  Google Scholar

  40. El Emam 1000, Dankar F. Protecting privacy using k-anonymity. J Am Med Inform Assoc. 2008;15:627–37.

    Article  Google Scholar

  41. Cunningham H, Tablan V, Roberts A, Bontcheva Chiliad. Getting more than out of biomedical documents with GATE's full lifecycle open source text analytics. PLoS Comput Biol. 2013;9(2):e1002854. https://doi.org/10.1371/journal.pcbi.1002854.

    CAS  Commodity  PubMed  PubMed Central  Google Scholar

  42. Carrell DS, Cronkite DJ, Malin BA, Aberdeen JS, Hirschman L. Is the juice worth the squeeze? Costs and benefits of multiple human annotators for clinical text de-identification. Methods Inf Med. 2016;55(4):356–64. https://doi.org/x.3414/ME15-01-0122.

    Commodity  PubMed  PubMed Central  Google Scholar

  43. Sweeney L. m-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst. 2002;10(5):557–70.

    Article  Google Scholar

  44. Scaiano M, et al. A unified framework for evaluating the risk of re-identification of text de-identification tools. J Biomed Inform. 2016;63:174–83. https://doi.org/10.1016/j.jbi.2016.07.015.

    Article  PubMed  Google Scholar

  45. Sweeney L. Matching known patients to health records in Washington Land information. Cambridge: Harvard Academy. Data Privacy Lab; 2013.

  46. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy dominion. J Am Med Inform Assoc. 2010;17(2):169–77. https://doi.org/10.1136/jamia.2009.000026.

    Article  PubMed  PubMed Central  Google Scholar

  47. Mello MM, Lieou 5, Goodman SN. Clinical trial participants' views of the risks and benefits of data sharing. N Engl J Med. 2018;378(23):2202–11. https://doi.org/10.1056/NEJMsa1713258.

    Commodity  PubMed  PubMed Cardinal  Google Scholar

  48. European Medicines Bureau, "Information anonymisation—a key enabler for clinical data sharing: workshop study," 2017.

    Google Scholar

  49. Malin BA, El Emam K, O'Keefe CM. Biomedical data privacy: bug, perspectives, and recent advances. J Am Med Inform Assoc. 2013;xx(1):2–half-dozen. https://doi.org/10.1136/amiajnl-2012-001509.

    Commodity  PubMed  PubMed Central  Google Scholar

Download references

Acknowledgements

The authors wish to thank Byron Jones, Bradley Malin, Frank Rockhold, and Rachel Li for reviewing earlier versions of this paper.

Funding

This study was funded past Novartis. The Novartis sponsor approved the design of the written report only was not involved in the data analysis or interpretation of the results.

Author data

Affiliations

Contributions

JB conceived the report, approved the report design, and was involved in writing the paper. NG, J-WC, and WM performed the re-identification evaluation, produced the results, interpreted the results, and were involved in writing the paper. CP and KEE were responsible for anonymizing the CSR and were involved in writing the paper. All authors read and approved the concluding manuscript.

Corresponding author

Correspondence to Khaled El Emam.

Ethics declarations

Ethics approval and consent to participate

Ethics approval for this study was obtained from Veritas IRB (IRB Tracking Number: 16356–fifteen:07:5818-04-2019).

Consent for publication

N/A.

Competing interests

The authors declare that they have no competing interests. By having a third party perform the re-identification evaluation independent of the party that performed the anonymization, we believe nosotros have managed the potential competing interests in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This commodity is distributed under the terms of the Creative Eatables Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted employ, distribution, and reproduction in whatsoever medium, provided y'all requite appropriate credit to the original author(s) and the source, provide a link to the Artistic Commons license, and indicate if changes were fabricated. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/ane.0/) applies to the data made bachelor in this commodity, unless otherwise stated.

Reprints and Permissions

Well-nigh this commodity

Verify currency and authenticity via CrossMark

Cite this article

Branson, J., Practiced, North., Chen, JW. et al. Evaluating the re-identification gamble of a clinical written report study anonymized under EMA Policy 0070 and Health Canada Regulations. Trials 21, 200 (2020). https://doi.org/10.1186/s13063-020-4120-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI : https://doi.org/10.1186/s13063-020-4120-y

shockanxionce.blogspot.com

Source: https://trialsjournal.biomedcentral.com/articles/10.1186/s13063-020-4120-y

Publicar un comentario for "Peer Review Thresholds Risk Assessment in Research Trial"