Child safety org launches AI model trained on real child sex abuse images

Xatolos@reddthat.com · 5 days ago

Child safety org launches AI model trained on real child sex abuse images

db0@lemmy.dbzer0.com · edit-2 5 days ago

It’s the earliest AI technology striving to expose unreported CSAM at scale.

horde-safety has been out for a year now. Just saying… It’s not a trained AI model in this way, but it’s still using Neural Networks (i.e. “AI Technology”)

sexual_tomato@lemmy.dbzer0.com · edit-2 4 days ago

Jesus Christ. If someone ever got their hands on this model they could use it to generate new material. The grossest possible AI model to date

Todd Bonzalez@lemm.ee · 4 days ago

No. This is an inference model, not a generative model. You generally cannot train a model for both, unless you do it on purpose, and they certainly did not (especially since inference models are way easier to train than generative models).

sexual_tomato@lemmy.dbzer0.com · edit-2 4 days ago

A generative model uses the classifier as part of its training. If you generate a picture of pure random noise, then iteratively pick random noise that the classifier says “looks” more like csam, then you can effectively generate images that the classifier says it’s 100% certain is csam. Whether or not that looks anything like what a human would consider to be csam depends on other factors but it remains a possibility.

Todd Bonzalez@lemm.ee · 4 days ago

You are describing the way deepdream works, not the way modern Diffusion models work. It’s the difference between psychedelic dog faces and a highly adherent generative image of a German Sheppard.

I can’t imagine you’re going to get anything out of this model that actually looks like CSAM, unless there’s some sort of breakthrough in using these models for previously unrealized generative purposes.

Churbleyimyam@lemm.ee · 5 days ago

I think all CSAM should be destroyed out of respect for the victims, not proliferated. I don’t care who is hanging onto this material or for what purpose.

Ghostie21@lemmy.world · 5 days ago

How is this proliferating csam? Also, how do you expect them to find csam without having known images? It gives a really nice way to check based on hashes without having someone look at every picture on someone’s harddrive. With this AI it should greatly help determining new or unknown images while minimizing the number of actual people that have to see that stuff, and who get scarred from looking at such images. The only reason to be against this is if you are looking at CP and want it to be harder to find, or if you don’t understand how this technology is being used.

Churbleyimyam@lemm.ee · 3 days ago

How is this proliferating csam?

Sharing it with people and companies that it wasn’t being shared with before.

Also, how do you expect them to find csam without having known images?

The same way it is now: people reporting it and undercover police accounts. People recognise it.

without having someone look at every picture on someone’s harddrive

If it’s going to get used as evidence in court a human will have to review and confirm it. I don’t think “Because the AI said so” is going to convince juries.

The only reason to be against this is if you are looking at CP

Or if it’s you or someone you love who is in the CP. Having further copies of it on further hard drives, whether it’s so someone can bake it into their AI tool or any other purpose is wrong. That’s just my view though.

Ghostie21@lemmy.world · 3 days ago

Sorry I cannot post a longer response but I’d suggest you look up how this type of forensic software is developed and used. There are a few good documentaries on it if you look, one I remember watching was on googles team for this stuff.

The images are not exactly shared in that very few people have access to them, and they treat it very much like classified information so that only select people can see them.

These models would be developed using normal images and then trained in closed systems with the real images where the accuracy is used and not the images. No need to scar the developers who just want to work.

Nothing about the reporting of people will change, the only difference is this will allow the FBI to have a list of suspected CP and a list of normal images from a computer allowing them to spend a fraction of the time looking at this stuff to document it. This is very important when you have people who have literally terrabytes of the stuff and probably even more normal images. In general we like to minimize the time spent looking at such stuff because it is so scarring.

As for showing the images in court, in the US hashes are acceptable evidence, again we don’t like to scar people by showing them this stuff. Additionally after you’ve been shown the 100th picture of a baby being abused and the FBI is telling you they have 1000000 more, you’ll just take their word for it.

Anyways, hope you have a good one

K̺͆e̺͆t̺͆a̺͆m̺͆i̺͆n̺͆e̺͆@sh.itjust.works · 5 days ago

At this point how does it differ w/ generating AI powered CP? morons

Railcar8095@lemm.ee · 5 days ago

It differs in basically being something completely different. This is a classification model, doesn’t have generative capabilities. Even if you were to get the model and it’s weights, and you tried to reverse engineer an “input” that it would classify as CP, it would most likely look like pure noise to you.

Moron

K̺͆e̺͆t̺͆a̺͆m̺͆i̺͆n̺͆e̺͆@sh.itjust.works · 4 days ago

Generate porn, classificate output, result very young looking models.

Moron

Railcar8095@lemm.ee · 4 days ago

So you need to have a model that generates CP to begin with. Flawless reasoning there.

Look, it’s clear you have no clue what you’re talking about. Stop demonstrating it, moron.

JackbyDev@programming.dev · 4 days ago

Alright, I found the name of what I was thinking of that sounds similar to what they’re suggesting: generative adversarial network (GAN).

The core idea of a GAN is based on the “indirect” training through the discriminator, another neural network that can tell how “realistic” the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner.

Railcar8095@lemm.ee · 4 days ago

Applying GAN won’t work. If used for filtering would result on results being skewed to a younger, but it won’t show 9 the body of a 9 year old unless the model could do that from the beginning.

If used to “tune” the original model, it will result on massive hallucination and aberrations that can result in false positives.

In both cases, decent results will be rare and time consuming. Anybody with the dedication to attempt this already has pictures and can build their own model.

Source: I’m a data scientist

JackbyDev@programming.dev · 4 days ago

The model I use (I forget the name) popped out something pretty sus once. I wouldn’t describe it as CP, but it was definitely weird enough to really make me uncomfortable. It’s the only thing it ever made that I immediately deleted and removed from the recycling bin too lol.

The point I’m making is that this isn’t as far fetched as you believe.

Plus, you can merge models. Get a general purpose model that knows what children look like, a general purpose pornographic model, merge them, then start generating and selecting images based on Thorn’s classifier.

Railcar8095@lemm.ee · 4 days ago

You can’t merge a generative model and a classification model. You can run then in series to get a bunch of false positives/hallucinations, but you can’t make it generate something from the other model.

K̺͆e̺͆t̺͆a̺͆m̺͆i̺͆n̺͆e̺͆@sh.itjust.works · edit-2 4 days ago

Not CP, but normal porn and select on CP traits, moron

Railcar8095@lemm.ee · 4 days ago

https://en.m.wikipedia.org/wiki/False_positives_and_false_negatives

Not that I think you will understand. I’m posting this mostly for those moronic enough to read your comments and think “that seems reasonable”

xionzui@sh.itjust.works · 5 days ago

Uh, well this one tells you if an image looks like it or not. It doesn’t generate images

K̺͆e̺͆t̺͆a̺͆m̺͆i̺͆n̺͆e̺͆@sh.itjust.works · 5 days ago

If it knows if an image looks like it it can generate something like it, one step further

The Hobbyist@lemmy.zip · edit-2 5 days ago

Thorn, the company backed by Ashton Kutcher and which tried to get its way to monitor all messages in the EU via Chat Control. No thanks.

https://fortune.com/europe/2023/09/26/thorn-ashton-kutcher-ylva-johansson-csam-csa-regulation-european-commission-encryption-privacy-surveillance/

Erasmus@lemmy.world · 5 days ago

Just remember folks. Kutcher is a slimeball too.

The guy went from a D list star and hanging out with the likes of Danny Masterson and going to Diddy’s infamous parties - to suddenly overnight courting the US government and being the face of ‘helping’ children everywhere.

Yeah right……

chonglibloodsport@lemmy.world · 5 days ago

I’d be wary of calling him guilty by association. Maybe when he realized who he was really hanging out with he was so horrified and disgusted that he just had to get involved and do something to fight back?

Erasmus@lemmy.world · 5 days ago

It’s awful coincidental that he seems to hang out with the ‘rapist’ crowd. Even going as far as writing a letter for Masterson as to how nice of a guy he is to try to get him a lenient sentence.

Even Hollywood has ostracized him and his wife - news sites recently reported they were looking to leave the country and let things cool off for a while.

I’m sure everyone is right though that keep posting here, that he is a swell guy who was just in the wrong place at the wrong time, multiple times. Several years worth of multiple times with wrong people. Just a coincidence.

BassTurd@lemmy.world · 5 days ago

The difference between us giving him a benefit of the doubt and claiming innocence and your take, is that you are labeling him a pedophile without proof. That’s a significant claim if false, and imo takes an assumption too far. Maybe he’s bad and it should be looked into, but saying he did something because he was on a show with and good friends with a guy that happened to be a rapist is wrong.

sunzu2@thebrainbin.org · 5 days ago

I am a bit confused how it is legal for them to have the training data here?

Like is there anything a corpo can’t do?

Like why can’t subway Jared and Catholic church “train the AI”

Only half way joking, what’s the catch here?

MentalEdge@sopuli.xyz · 5 days ago

There are laws around it. Law enforcement doesn’t just delete any digital CSAM they seize.

Known CSAM is archived and analyzed rather than destroyed, and used to recognize additional instances of the same files in the wild. Wherever file scanning is possible.

Institutions and corporation can request licenses to access the database, or just the metadata that allows software to tell if a given file might be a copy of known CSAM.

This is the first time an attempt is being made at using the database to create software able to recognize CSAM that isn’t already known.

I’m personally quite sceptical of the merit. It may well be useful for scanning the public internet, but I’m guessing the plan is to push for it to be somehow implemented for private communication, no matter how badly that compromises the integrity of encryption.

melroy@kbin.melroy.org · 5 days ago

So doesn’t that make the law enforcement having the biggest CP collection from everybody? This sounds kinda dangerous…

MentalEdge@sopuli.xyz · edit-2 5 days ago

It does. Kinda.

The police are seldom allowed to be in possession of CSAM, except for in terms of grabbing the hardware which contains it in an arrest. The database used in modern detection tools is maintained by NCMEC which has special permission to do so.

And of course there are risks, but it’s just digital data. Unless you are creating more, you’re not actively harming anyone. And law enforcement absolutely needs that data to take some of the most obvious steps to prevent it being spread further.

Obviously, someone has access, but to get to the actual media files wouldn’t be simple. What typically happens, is that anyone wanting to detect CSAM, is given a hashed version of the database. They can then scan their systems for CSAM by hashing any media they are hosting, and seeing whether there are any matches.

Whenever possible, people aren’t handling the actual media. But for any detection to be possible to begin with, the database of the actual media does need to be maintained somewhere.

AI is a touchier subject, as you can’t train a model to recognize CSAM not already in the database using hashes, so in those cases you have to work with actual real media. This is only recently becoming a thing.

It also leaves open the possibility for false positives. An oft cited example is parents taking pictures of their own children for innocent reasons, or doctors and parents handling images for valid medical reasons. In a system that flagged such content, it would mean someone else would be seeing that “private” content because it was flagged.

Kyrgizion@lemmy.world · 5 days ago

Not a single peep about false positives.

I’m sure it won’t be abused though. And if anyone does complain, just get their electronics seized and checked, because they must be hiding something!

oldfart@lemm.ee · 5 days ago

Reminds me of the A cup breasts porn ban in Australia a few years ago, because only pedos would watch that

baldingpudenda@lemmy.world · 5 days ago

There was a a porn studio that was prosecuted for creating CSAM. Brazil i belive. Prosecutors claimed that the petite, A-cup woman was clearly underaged. Their star witness was a doctor who testified that such underdeveloped breasts and hips clearly meant she was still going through puberty and couldn’t possible be 18 or older. The porn star showed up to testify that she was in fact over 18 when they shot the film and included all her identification including her birth certificate and passport. She also said something to the effect of women come in all shapes and sizes and a doctor should know better.

I can’t find an article. All I’m getting is GOP trump pedo nominees and brazil laws on porn.

Clinicallydepressedpoochie@lemmy.world · 5 days ago

Awe man, I love all titties. Variety is the spice of life.

Scratch@sh.itjust.works · 5 days ago

Not to mention the self image impact such things would have on women with smaller breasts, who (as I understand it) generally already struggle with poor self image due to breast size.

sunzu2@thebrainbin.org · 5 days ago

Clearly the state gives zero fucks about these women, or anyone else or even “the children”

Catholic Church is still around for a reason

floofloof@lemmy.ca · 5 days ago

This seems like a potential actual good use of AI. Can’t have been much fun to train it though.

And is there any risk of people turning these kinds of models around and using them to generate images?

Jimbabwe@lemmy.world · 5 days ago

If AI was reliable, maybe. MAYBE. But guess what? It turns out that “advanced autocomplete” does a shitty job of most things, and I bet false positives will be numerous.

Chozo@fedia.io · 5 days ago

This is not that kind of AI.

Child safety org launches AI model trained on real child sex abuse images

Child safety org launches AI model trained on real child sex abuse images

Child safety org flags new CSAM with AI trained on real child sex abuse images