“Those queries are stored and will almost certainly be used for developing the LLM service or model at some point. This could mean that the LLM provider (or its partners/contractors) are able to read queries and may incorporate them in some way into future versions,” it added. Another risk, which increases as more organizations produce and use LLMs, is that queries stored online may be hacked, leaked, or accidentally made publicly accessible, the NCSC wrote.
Ultimately, there is genuine cause for concern regarding sensitive business data being inputted into and used by ChatGPT, although the risks are likely less pervasive than some headlines make out.
Likely risks of inputting sensitive data to ChatGPT
LLMs exhibit an emergent behavior called in-context learning. During a session, as the model receives inputs, it can become conditioned to perform tasks based upon the context contained within those inputs. “This is likely the phenomenon people are referring to when they worry about information leakage. However, it is not possible for information from one user’s session to leak to another’s,” Andy Patel, senior researcher at WithSecure, tells CSO. “Another concern is that prompts entered into the ChatGPT interface will be collected and used in future training data.”
Although it’s valid to be concerned that chatbots will ingest and then regurgitate sensitive information, a new model would need to be trained in order to incorporate that data, Patel says. Training LLMs is an expensive and lengthy procedure, and he says he would be surprised if a model were trained on data collected by ChatGPT in the near future. “If a new model is eventually created that includes collected ChatGPT prompts, our fears turn to membership inference attacks. Such attacks have the potential to expose credit card numbers or personal information that were in the training data. However,