HES data handled in Google cloud
- 3 March 2014
The Health and Social Care Information Centre has spent another day fire-fighting concerns about care.data, following reports that different companies had uploaded Hospital Episode Statistics to Google and made HES data publically available.
The HSCIC issued a statement this afternoon confirming that its predecessor, the NHS Information Centre, had signed an agreement to share HES statistics with PA Consulting in 2011, and that it had used a tool called Google BigQuery to manipulate the datasets.
Around the same time, it emerged that a company called Earthware had made HES data publically available via an online mapping tool that could be accessed from the HSCIC’s own website, although the link has now been replaced by a note saying the HSCIC is investigating.
PA Consulting had not only made no attempt to hide its use of HES data, but had proudly written up its work in a chapter on a white paper on ‘Healthcare Reform.’
This explains that it initially “bought” the data and installed it on its own servers, but it proved too time-consuming to run useful queries. As a result, it decided to “upload it to the cloud using tools such as Google Storage and use BigQuery to extract data from it.”
BigQuery is described by Google as a service that enables users to “run super-fast, SQL-like queries against terabytes of data in seconds” using the company’s storage infrastructure. PA Consulting’s paper says that by using it, it was able to produce “interactive maps directly from HES queries in seconds.”
The HSCIC statement confirms that it knew PA Consulting was going to use the tool, and that it imposed a number of safeguards.
It says it received “written confirmation from PA Consulting that no Google staff would be able to access the data” and that “data continued to be restricted t the individuals named in the [original] data sharing agreement.
However, the distributed nature of the Google service means it seems extremely likely that the data would have been held outside the UK and EU.
At a Commons’ health select committee hearing into care.data last week, NHS England director of patients and information Tim Kelsey assured MPs that care.data information would not be sent to the US.
The PA Consulting revelations appear to indicate that HES data, which will form the core of the care.data service when combined with GP and other datasets, has already been sent to US and other servers.
In a further development, public health minister Jane Ellison was forced to apologise to Parliament for misleading MPs last week, when she said that only “publicly available, non-indentifiable” information in “aggregate form” had been sold to an actuarial body that used it to re-calculate insurance premiums.
In fact, the data was pseudonymised data, which could potentially have been re-identified, for thousands of hospital attendances.
In a statement, PA Consulting said: “PA purchased the commercially available Hospital Episode Statistics data set from the NHS Information Centre [now the Health and Social Care Information Centre].
“The data set does not contain information linked to specific individuals. The information is held securely in the cloud in accordance with conditions specified and approved by HSCIC.
“This new approach to analytics can help the NHS improve patient care. We have been able to identify where services are needed most and to understand previously unseen side effects of drugs and treatments.
“Our approach protects patient confidentiality and allows insights to be derived at significantly lower cost, and a hundred times faster, than any traditional approach.”
Read EHI editor Jon Hoeksma's analysis of last week's disastrous events for care.data, which saw it become the subject of a Downfall spoof.