Data Curation and its tenets

Author: Lungani Ndwandwe, Tulio de Oliveira - 2013-08-15

Over the past few years SATuRN has collated approximately 7000 HIV drug resistance genotypes linked to clinical information and treatment in the Southern African region.

SATuRN has developed a cross-national public health and virological research collaboration to implement innovative means of shared data collection, management, analysis, monitoring, and evaluating outcomes in response to ART in resource limited settings. This successful data collection and pro-patient output technique has been exacerbated by the fact that SATuRN hosts the two best HIV drug resistance database the Stanford and Rega HIV drug resistance database.

The datasets grow everyday with more data added into the database. The database pertains longitudinal datasets consisting of pediatrics and adults cohorts of patients who are failing their ART treatment. The database has grown to be a rich database with interesting/significant variables in mapping the prevalence by demography, social context, and clinical information, to name just a few. With this data and clinical cases SATuRN has been able to train more than 2000 clinicians and nurses involved in treating patients failing ART. This data has come useful to researchers as many publications are arisen from this rich-dataset.

Hacking back to the subject of this article, HIV drug resistance is a complex subject; with complexities that are better understood now rather than later. This of course begs for the data to be continuously cleaned and kept relevant. The SATuRN database is continuously curated. The Africa Centre, as a prime example, rigorously compares the information collected by its clinicians dealing with the patient with the central information stored at Hlabisa Hospital laboratory?s database called ARTemis, comparing and adding new Viral Loads, CD4 and treatment history in order to keep the data accurate and clean.

Mistakes and errors can be fortunately also be picked up by the Africa Centre data curation staff, e.g. error on treatment regimen, viral measuraments, abnormal mutations inconsistent with the patient?s treatment. After this process the specialist physician who suggests/recommends treatment will oversee if the given information makes at a clinical level. Each result goes through 3 data curation points before it eventually comes back to the primary clinician managing the patient. The information in the Rega database is checked and updated every month to track the progress of patients who have received a genotype test.

The question is; what should motivate one to keep quality/curated data? Or really what is the point of writing this article? The only answer must be this, what is collected is NOT just numbers, or 'just data for analysis'; numbers equates to people, people who are failing treatment, individuals with families and loved ones. This is the back bone of the research and the fuelling factor in the immediate plans to have our drug resistance research outputs influence policy.

Detailed information: SATuRN website at


Blogs: Data Curation and its tenets

KRISP has been created by the coordinated effort of the University of KwaZulu-Natal (UKZN), the Technology Innovation Agency (TIA) and the South African Medical Research Countil (SAMRC).

Location: K-RITH Tower Building
Nelson R Mandela School of Medicine, UKZN
719 Umbilo Road, Durban, South Africa.
Director: Prof. Tulio de Oliveira