theguardian.com
Hannah Devlin and Tom Burgis
Sat 14 Mar 2026 07.00 CET
Exclusive: Guardian investigation finds data from flagship medical research leaked dozens of times
Confidential health data has been exposed online on dozens of occasions, a Guardian investigation can reveal, raising questions about the safeguarding of patient records by one of the UK’s flagship medical research projects.
UK Biobank, which holds the medical records of 500,000 British volunteers, is one of the world’s most comprehensive stores of health information and is credited with driving breakthroughs in cancer, dementia and diabetes research. But scientists approved to access Biobank’s sensitive data appear to have sometimes been cavalier about its security.
The files, which seem to have been inadvertently posted online by researchers using the data, do not include names or addresses, but they may still pose privacy concerns. One dataset found by the Guardian contained millions of hospital diagnoses and associated dates for more than 400,000 participants.
With the consent of a Biobank volunteer, the Guardian was able to pinpoint what appeared to be extensive hospital diagnosis records for the volunteer, using only their month and year of birth and details of a major surgery they had undergone.
"The file was very detailed and it felt like a gross invasion of privacy even to glance at
Data expert"
One data expert said the scale and persistence of the problem was “shocking” at a time when AI and social media were making it ever easier to cross-reference information online.
UK Biobank rejected the concerns, saying that no identifying data, such as names and addresses, were provided to researchers.
In a statement, Prof Sir Rory Collins, the chief executive of UK Biobank, said: “We have never seen any evidence of any UK Biobank participant being re-identified by others.”
’They said they would hold our data securely’
Founded in 2003 by the Department of Health and medical research charities, UK Biobank holds genome sequences, scans, blood samples and lifestyle information of 500,000 volunteers. Last month, the government extended Biobank’s access to volunteers’ GP records.
Scientists at universities and private companies across the world apply for access and, until late 2024, were free to download data directly on to their own computer systems.
Before this point, data had been inadvertently published online and Biobank appears to still be grappling with the problem.
The issue emerged because journals and funders increasingly require researchers to publish the code they have used to analyse large datasets. When intending to upload code, some researchers have also accidentally published partial or entire Biobank datasets to GitHub, a popular online code-sharing platform. UK Biobank prohibits researchers from sharing data outside their systems and says it has introduced further training for all researchers.
In the past year, the data leaks appear to have become a more urgent concern to UK Biobank. Between July and December 2025, it issued 80 legal notices to GitHub, which has complied with requests to remove data from the internet. Yet much still remains available.
Some of the data files contain just patient IDs, or test results for small numbers, others are more extensive. One dataset found online by the Guardian in January contained hospital diagnoses and associated diagnosis dates for about 413,000 participants, along with their sex and month and year of birth.
A data expert, who reviewed the file said: “It sent shivers down my spine to even open. I deleted the file immediately. It was very detailed and felt like a gross invasion of privacy even to glance at.”
To test the risk of re-identification, the Guardian approached several Biobank volunteers, two of whom had undergone medical procedures in the timeframe within the data and agreed to share these details with an external data scientist.
One volunteer, who provided treatment dates for a fracture and seizure, could not be located in the dataset. A second volunteer, a woman in her 70s, shared her month and year of birth and the month and year she had a hysterectomy. Only one person in the dataset matched these details. The apparent match was corroborated by five other diagnoses from the records that the volunteer had not initially disclosed.
“Effectively you were rehearsing the main parts of my medical history to me without me having given you any information at all. I didn’t expect that,” the volunteer said.
The woman said she was not too concerned about her own data being exposed and intended to remain a participant, saying that she viewed UK Biobank’s work as “extremely important”. But, she added: “I’m more concerned about whether Biobank has broken its agreement with people. They said they would hold our data securely … I just feel as though that has to come into the equation.”
UK Biobank said the re-identification scenario tested by the Guardian did not highlight a privacy risk because without additional information it would be impossible to identify individuals.
A Biobank spokesperson said: “As we have communicated to our participants, including on our website: ‘If a participant puts information that reveals something about their health and identity, such as genealogy data, on a public website, this could make it possible for their identity to be discovered by cross-referencing UK Biobank research data.’
“You have simply demonstrated why we tell participants not to do this.”
The spokesperson added that Biobank had taken extensive measures to protect participants’ privacy, including proactively searching GitHub, contacting researchers directly and issuing legal takedown notices, actions which they said had led to about 500 repositories being removed. Many of these, it said, contained only patient IDs, not health data.
"The idea they can rely on volunteers never putting any other information out about themselves is entirely unreasonable
Prof Felix Ritchie"
‘There are tensions between driving research with data and protecting privacy’
Privacy experts said UK Biobank’s approach appeared at odds with the reality that many people, reasonably, shared some health information online and that in an age of AI this could readily be identified and cross-referenced.
“Are these people aware that the internet exists?” asked Prof Felix Ritchie, an economist at the University of the West of England. “The idea that they can rely on their volunteers never putting any other information out there about themselves is an entirely unreasonable thing to expect.”
Dr Luc Rocher, associate professor at the Oxford Internet Institute, who reviewed several Biobank datasets found online, said that removing identifiers often did not guarantee anonymity and that simply knowing a person’s birthday and, say, the date they broke a leg might be enough to pinpoint their record with high confidence.
“Once identified, that record could reveal sensitive information such as a psychiatric diagnosis, an HIV test result, or a history of drug abuse,” they said.
Prof Niels Peek, professor of data science and healthcare improvement at the University of Cambridge, said the scale of the problem was “shocking”. “If it had happened once or 10 times I’d probably say: ‘It’s not great that it’s happened but at the same time zero risk is impossible,’” he said. “Hundreds. That’s a little bit too much.”
In Peek’s view, Biobank’s actions show it has taken the issue seriously and “done everything that one can reasonably expect”. But, he added: “The scale and persistence with which this has happened demonstrates that there are huge tensions between the ambition to drive health research with data at scale and the legal and ethical imperative to protect people’s privacy.”
Experts questioned whether Biobank will be able to fully regain control of the data released online. Despite researchers and GitHub having taken down most of the offending repositories in response to Biobank’s requests, many of the relevant files remained available on a code archive website until shortly before publication.
securityweek.com
ByIonut Arghire| February 28, 2026 (6:50 AM ET)
More than 38 million accounts were affected by an October 2025 data breach at Canadian retail giant Canadian Tire.
The incident was discovered on October 2 and involved unauthorized access to an e-commerce database, the company said.
“The database contained basic personal information for customers who have an e-commerce account with one or more of Canadian Tire, SportChek, Mark’s/L’Équipeur and Party City,” the retail giant announced in October.
Canadian Tire said at the time that the compromised information included names, email addresses, dates of birth, encrypted passwords, and, in some cases, incomplete credit card numbers.
Fewer than 150,000 accounts had date of birth details compromised, the company said.
Canadian Tire also underlined that the password and credit card information could not be used to access users’ accounts or to perform fraudulent transactions and purchases, and that no Canadian Tire Bank information or Triangle Rewards loyalty data was compromised in the incident.
This week, the data set associated with the incident was added to the data breach notification website Have I Been Pwned.
According to the website, roughly 42 million records were compromised in the attack, including 38.3 million email addresses. In addition to the details shared by Canadian Tire, the leaked compromised data also includes addresses, phone numbers, and gender information.
“Passwords were stored as PBKDF2 hashes, and for a subset of records, dates of birth and partial credit card data were also included (card type, expiry, and masked card number),” Have I Been Pwned notes.
Canadian Tire has notified the affected individuals via email but has yet to publicly confirm the number of victims.
As Scale AI seeks to reassure customers that their data is secure following Meta's $14.3 billion investment, leaked files and the startup's own contractors indicate it has some serious security holes.
Scale AI routinely uses public Google Docs to track work for high-profile customers like Google, Meta, and xAI, leaving multiple AI training documents labeled "confidential" accessible to anyone with the link, Business Insider found.
Contractors told BI the company relies on public Google Docs to share internal files, a method that's efficient for its vast army of at least 240,000 contractors and presents clear cybersecurity and confidentiality risks.
Scale AI also left public Google Docs with sensitive details about thousands of its contractors, including their private email addresses and whether they were suspected of "cheating." Some of those documents can be viewed and also edited by anyone with the right URL.
The Real World, a learning platform from the controversial social media personality Andrew Tate, has leaked nearly a million users and over 22 million messages.
Hundreds of thousands of exposed users, millions of messages, and session tokens – that’s the reality that The Real World finds itself in.
The Cybernews research team has uncovered an exposed MongoDB instance with 88GB from one of The Real World’s servers.
The Boeing Company, a jetliner manufacturer and US defense contractor, had the company’s data leaked by the LockBit ransomware gang. So far, around 50 gigabytes of compressed data was uploaded LockBit's dark web blog.
LockBit has allegedly started leaking data that the gang stole from Boeing in late October. The Cybernews research team noted there's around of 50 GB of supposedly Boeing's data. Bulk of the data appears to be various backups.