futurism.com Aug 27, 5:05 PM EDT by Noor Al-Sibai
OpenAI has authorized itself to call law enforcement if users say threatening enough things when talking to ChatGPT.
Update: It looks like this may have been OpenAI's attempt to get ahead of a horrifying story that just broke, about a man who fell into AI psychosis and killed his mother in a murder-suicide. Full details here.
For the better part of a year, we've watched — and reported — in horror as more and more stories emerge about AI chatbots leading people to self-harm, delusions, hospitalization, arrest, and suicide.
As the loved ones of the people impacted by these dangerous bots rally for change to prevent such harm from happening to anyone else, the companies that run these AIs have been slow to implement safeguards — and OpenAI, whose ChatGPT has been repeatedly implicated in what experts are now calling "AI psychosis," has until recently done little more than offer copy-pasted promises.
In a new blog post admitting certain failures amid its users' mental health crises, OpenAI also quietly disclosed that it's now scanning users' messages for certain types of harmful content, escalating particularly worrying content to human staff for review — and, in some cases, reporting it to the cops.
"When we detect users who are planning to harm others, we route their conversations to specialized pipelines where they are reviewed by a small team trained on our usage policies and who are authorized to take action, including banning accounts," the blog post notes. "If human reviewers determine that a case involves an imminent threat of serious physical harm to others, we may refer it to law enforcement."
That short and vague statement leaves a lot to be desired — and OpenAI's usage policies, referenced as the basis on which the human review team operates, don't provide much more clarity.
When describing its rule against "harm [to] yourself or others," the company listed off some pretty standard examples of prohibited activity, including using ChatGPT "to promote suicide or self-harm, develop or use weapons, injure others or destroy property, or engage in unauthorized activities that violate the security of any service or system."
But in the post warning users that the company will call the authorities if they seem like they're going to hurt someone, OpenAI also acknowledged that it is "currently not referring self-harm cases to law enforcement to respect people’s privacy given the uniquely private nature of ChatGPT interactions."
While ChatGPT has in the past proven itself pretty susceptible to so-called jailbreaks that trick it into spitting out instructions to build neurotoxins or step-by-step instructions to kill yourself, this new rule adds an additional layer of confusion. It remains unclear which exact types of chats could result in user conversations being flagged for human review, much less getting referred to police. We've reached out to OpenAI to ask for clarity.
While it's certainly a relief that AI conversations won't result in police wellness checks — which often end up causing more harm to the person in crisis due to most cops' complete lack of training in handling mental health situations — it's also kind of bizarre that OpenAI even mentions privacy, given that it admitted in the same post that it's monitoring user chats and potentially sharing them with the fuzz.
To make the announcement all the weirder, this new rule seems to contradict the company's pro-privacy stance amid its ongoing lawsuit with the New York Times and other publishers as they seek access to troves of ChatGPT logs to determine whether any of their copyrighted data had been used to train its models.
OpenAI has steadfastly rejected the publishers' request on grounds of protecting user privacy and has, more recently, begun trying to limit the amount of user chats it has to give the plaintiffs.
Last month, the company's CEO Sam Altman admitted during an appearance on a podcast that using ChatGPT as a therapist or attorney doesn't confer the same confidentiality that talking to a flesh-and-blood professional would — and that thanks to the NYT lawsuit, the company may be forced to turn those chats over to courts.
In other words, OpenAI is stuck between a rock and a hard place. The PR blowback from its users spiraling into mental health crises and dying by suicide is appalling — but since it's clearly having trouble controlling its own tech enough to protect users from those harmful scenarios, it's falling back on heavy-handed moderation that flies in the face of its own CEO's promises.
www.digitaldigging.org - Digital Digging investigation: how your AI conversation could end your career
Corporate executives, government employees, and professionals are confessing to crimes, exposing trade secrets, and documenting career-ending admissions in ChatGPT conversations visible to anyone on the internet.
A Digital Digging investigation analyzed 512 publicly shared ChatGPT conversations using targeted keyword searches, uncovering a trove of self-incrimination and leaked confidential data. The shared chats include apparent insider trading schemes, detailed corporate financials, fraud admissions, and evidence of regulatory violations—all preserved as permanently searchable public records.
Among the discoveries is a conversation where a CEO revealed this to ChatGPT:
Confidential Financial Data: About an upcoming settlement
Non-Public Revenue Projections: Specific forecasts showing revenue doubling
Merger intelligence: Detailed valuations
NDA-Protected Partnerships: Information about Asian customers
The person also revealed internal conflict and criticizing executives by name.
Our method reveals an ironic truth: AI itself can expose these vulnerabilities. After discussing the dangers of making chats public, we asked Claude, another AI chatbot, to suggest Google search formulas that might uncover sensitive ChatGPT conversations.
venturebeat.com - OpenAI abruptly removed a ChatGPT feature that made conversations searchable on Google, sparking privacy concerns and industry-wide scrutiny of AI data handling.
OpenAI made a rare about-face Thursday, abruptly discontinuing a feature that allowed ChatGPT users to make their conversations discoverable through Google and other search engines. The decision came within hours of widespread social media criticism and represents a striking example of how quickly privacy concerns can derail even well-intentioned AI experiments.
The feature, which OpenAI described as a “short-lived experiment,” required users to actively opt in by sharing a chat and then checking a box to make it searchable. Yet the rapid reversal underscores a fundamental challenge facing AI companies: balancing the potential benefits of shared knowledge with the very real risks of unintended data exposure.
How thousands of private ChatGPT conversations became Google search results
The controversy erupted when users discovered they could search Google using the query “site:chatgpt.com/share” to find thousands of strangers’ conversations with the AI assistant. What emerged painted an intimate portrait of how people interact with artificial intelligence — from mundane requests for bathroom renovation advice to deeply personal health questions and professionally sensitive resume rewrites. (Given the personal nature of these conversations, which often contained users’ names, locations, and private circumstances, VentureBeat is not linking to or detailing specific exchanges.)
“Ultimately we think this feature introduced too many opportunities for folks to accidentally share things they didn’t intend to,” OpenAI’s security team explained on X, acknowledging that the guardrails weren’t sufficient to prevent misuse.
0din.ai - In a recent submission last year, researchers discovered a method to bypass AI guardrails designed to prevent sharing of sensitive or harmful information. The technique leverages the game mechanics of language models, such as GPT-4o and GPT-4o-mini, by framing the interaction as a harmless guessing game.
By cleverly obscuring details using HTML tags and positioning the request as part of the game’s conclusion, the AI inadvertently returned valid Windows product keys. This case underscores the challenges of reinforcing AI models against sophisticated social engineering and manipulation tactics.
Guardrails are protective measures implemented within AI models to prevent the processing or sharing of sensitive, harmful, or restricted information. These include serial numbers, security-related data, and other proprietary or confidential details. The aim is to ensure that language models do not provide or facilitate the exchange of dangerous or illegal content.
In this particular case, the intended guardrails are designed to block access to any licenses like Windows 10 product keys. However, the researcher manipulated the system in such a way that the AI inadvertently disclosed this sensitive information.
Tactic Details
The tactics used to bypass the guardrails were intricate and manipulative. By framing the interaction as a guessing game, the researcher exploited the AI’s logic flow to produce sensitive data:
Framing the Interaction as a Game
The researcher initiated the interaction by presenting the exchange as a guessing game. This trivialized the interaction, making it seem non-threatening or inconsequential. By introducing game mechanics, the AI was tricked into viewing the interaction through a playful, harmless lens, which masked the researcher's true intent.
Compelling Participation
The researcher set rules stating that the AI “must” participate and cannot lie. This coerced the AI into continuing the game and following user instructions as though they were part of the rules. The AI became obliged to fulfill the game’s conditions—even though those conditions were manipulated to bypass content restrictions.
The “I Give Up” Trigger
The most critical step in the attack was the phrase “I give up.” This acted as a trigger, compelling the AI to reveal the previously hidden information (i.e., a Windows 10 serial number). By framing it as the end of the game, the researcher manipulated the AI into thinking it was obligated to respond with the string of characters.
Why This Works
The success of this jailbreak can be traced to several factors:
Temporary Keys
The Windows product keys provided were a mix of home, pro, and enterprise keys. These are not unique keys but are commonly seen on public forums. Their familiarity may have contributed to the AI misjudging their sensitivity.
Guardrail Flaws
The system’s guardrails prevented direct requests for sensitive data but failed to account for obfuscation tactics—such as embedding sensitive phrases in HTML tags. This highlighted a critical weakness in the AI’s filtering mechanisms.
OpenAI on Tuesday announced the launch of ChatGPT for government agencies in the U.S. ...It allows government agencies, as customers, to feed “non-public, sensitive information” into OpenAI’s models while operating within their own secure hosting environments, OpenAI CPO Kevin Weil told reporters during a briefing Monday.
Since the launch of ChatGPT, OpenAI has sparked significant interest among both businesses and cybercriminals. While companies are increasingly concerned about whether their existing cybersecurity measures can adequately defend against threats curated with generative AI tools, attackers are finding new ways to exploit them. From crafting convincing phishing campaigns to deploying advanced credential harvesting and malware delivery methods, cybercriminals are using AI to target end users and capitalize on potential vulnerabilities.
Barracuda threat researchers recently uncovered a large-scale OpenAI impersonation campaign targeting businesses worldwide. Attackers targeted their victims with a well-known tactic — they impersonated OpenAI with an urgent message requesting updated payment information to process a monthly subscription.
We banned accounts linked to an Iranian influence operation using ChatGPT to generate content focused on multiple topics, including the U.S. presidential campaign. We have seen no indication that this content reached a meaningful audience.
A NewsGuard audit found that chatbots spewed misinformation from American fugitive John Mark Dougan.
#AI #Axios #ChatGPT #Google #Illustrations #License #Microsoft #Misinformation #OpenAI #Visuals #genAI #generative #or
A hacker has released a jailbroken version of ChatGPT called "GODMODE GPT."
Earlier today, a self-avowed white hat operator and AI red teamer who goes by the name Pliny the Prompter took to X-formerly-Twitter to announce the creation of the jailbroken chatbot, proudly declaring that GPT-4o, OpenAI's latest large language model, is now free from its guardrail shackles.
Researchers from Salt Security discovered three types of vulnerabilities in ChatGPT plugins that can be could have led to data exposure and account takeovers.
ChatGPT plugins are additional tools or extensions that can be integrated with ChatGPT to extend its functionalities or enhance specific aspects of the user experience. These plugins may include new natural language processing features, search capabilities, integrations with other services or platforms, text analysis tools, and more. Essentially, plugins allow users to customize and tailor the ChatGPT experience to their specific needs.