AI Safety

2025-03-27

#gen-ai

Using AI generated material

Before using information or data generated by an AI service, be aware:

Generative AI can hallucinate, which is another way of saying it makes things up. Check all information before using it. Carefully parse summaries in case it misses nuance or key information.
Always disclose when using content directly generated by an AI model. That's not just best practice, it's the law¹.
Don't use it for anything critical (legal, medical, life-critical, …). It is not a subsitute for a human.

Your data

Before uploading data to an AI service:

If it is someone else's data, including something they have written, get their permission first. Avoid uploading personally identifiable information (PII) or anything sensitive. If unsure, don't upload it.
Data you upload may be stored (harvested) and used for further training their models. Some paid services offer opt-outs.

A short summary of the main services

Free

CHATGPT, CLAUDE and MICROSOFT COPILOT have limited privacy controls but no opt-out of harvesting.
For programming AMAZON Q does not store or harvest your code
LibreTranslate is open source and does not store or harvest your data.

Paid

robust privacy controls from CHATGPT, CLAUDE and MICROSOFT COPILOT on harvesting of your data. However the default will be to collect data so you will have to go and turn it off in your account settings.
For programming, GitHub Copilot does not make a clear gurantee for personal users against harvesting your code
DeepL Pro does not store or harvest your data.

Avoid

Google do not allow consumers opt-out of data harvesting by its AI services².
DeepSeek don't offer information on privacy - assume they are harvesting just like the others.

Bottom line

If you're going to use it for anything serious, consider paying for a service that offers you control of your data. Though Amazon Q for programming and LibreTranslate are exceptions to that rule. That doesn't mean you can't use the free services to learn. Be aware of the risks and be careful what data you share.

Images and videos

Not just with AI but sharing on the internet in general:

Images contain metadata usually including exact location and time. This can be used to identify you or your location. This can be removed using free open source software such as ExifCleaner.
Don't use services that offer to remove metadata by uploading to a website - who are they? Doesn't that defeat the purpose?

DON'T TRUST YOUR DATA TO JUST ANYONE. BE SURE YOU KNOW WHO THEY ARE AND WHAT THEY DO WITH IT. CHECK THE PRIVACY POLICY.

Deploying generated code

Deploying code that you don't understand, especially to the public cloud, comes with risks

Security vulnerabilities: Code might introduce exploits (e.g. SQL injection, command injection, insecure deserialization) if it handles inputs unsafely.
Unintended behaviour / logic errors: The code may work differently than expected — especially if it was generated based on ambiguous or under-specified prompts.
Maintainability issues: If the code is opaque or overly complex, future debugging or updates become difficult, increasing long-term tech debt.
Compliance breaches: The code might violate licensing terms, data protection policies, or regulatory constraints (e.g. GDPR), especially if third-party libraries are embedded.

Specific Technical Risks

Hardcoded secrets or unsafe defaults: AI might generate code with secrets in plaintext or with insecure configurations (e.g. permissive CORS, open S3 buckets).
Resource mismanagement: Poor handling of memory, threads, or async calls might lead to crashes, race conditions, or degraded performance.
Dependency hell: Generated code might suggest niche or outdated libraries, potentially leading to version conflicts or insecure packages.
Silent failures: Lack of error handling or logging can mean critical issues go undetected, especially in batch jobs or background workers.

Cost

Unexpected costs: Generated code might lead to excessive resource consumption (e.g. infinite loops, excessive API calls), resulting in massive unexpected cloud bills.

Social & Organizational Risks

Erosion of team trust or skill: If a team routinely deploys code they don’t understand, it sets a precedent for poor engineering practices.
Increased onboarding time: New team members will struggle to decipher opaque, unreviewed code, slowing velocity.
Incident response delay: When something breaks, the team may lack the context or confidence to respond quickly, especially under pressure.

Other jurisdictions may have similar laws, I've mentioned the EU act as that's what's relevant to me and (I think) most people reading this, but otherwise check your local laws.

They spin a good yarn on privacy controls but they are usually talking about generic account data settings, or enterprise agreements offered to large organizations. They applaud their own transparency on how data is uesd, but as a consumer, if you don't want your data harvested I would avoid.