Tested 8 ai image detectors on the same 50 images — results are kinda wild

ok so I’ve been going back and forth with people about which AI image detector is actually reliable and I got tired of the arguments so i just… tested them myself lol

put together a set of 50 images — 25 real ones (photography + digital art i pulled from my own library and some friends’ portfolios) and 25 AI generated (midjourney v6, dalle 3, SDXL, and a few from Flux).

ran all 50 through: Hive Moderation, Illuminarty, AI or Not, Optic, Sensity, Content Credentials Verify, the huggingface SDXL detector, and Was It AI.

the TLDR: none of them got everything right. best performer hit 84% accuracy, worst was basically flipping a coin at 56%

what really got me was how much they disagreed with eachother. one midjourney landscape (#17) got flagged as HUMAN by 5 out of 8 tools. and an actual photograph of fog over a lake (#31) got flagged as AI by four of them. like… cmon.

working on a full spreadsheet with methodology and raw data. anyone interested in seeing it? also curious if anyones done something similar — would love to compare notes

1 Like

Definitely interested in the spreadsheet. I run a small stock review service and we’ve been relying on Hive as our primary filter. The fact that it’s not even close to 100% is… not great news for our pipeline lol

Did you notice any patterns in what types of images fool the detectors most? Like is it landscapes, portraits, abstract stuff?

2 Likes

this is really valuable data, thanks for doing this

the fog photo getting flagged by 4 tools is exacty the kind of thing that worries me. I’m a photographer and some of my long exposure stuff gets flagged all the time. heavy post processing apparently = “must be AI” according to these tools

would love to see the methodology. did you use the free tiers or paid versions? some of these tools have different models for paid users

1 Like

56% is genuinely embarrasing for a detection tool. Thats worse than random chance if you account for the base rate of AI images in most datasets.

I wonder how much the compression affected results though. Did you test with the original files or did you re-save them as jpegs first? I ask because most of these tools perform differently on compressed vs uncompressed images and it can swing accuracy by 10-15%

1 Like

Super interested. I’ve been doing something similar but just with text detectors. The results are equally depressing. Might be worth comparing notes on methodology if you’re up for it.

Also re: the midjourney landscape — v6 is genuinely getting scary good at natural scenes. I showed some midjourney landscapes to a photographer friend without context and she couldn’t tell. The only giveaway was a slightly off reflection in a lake.

@MaxFlare83 good question. yes, landscapes and macro/close-up stuff seems to trip them up the most. Portraits were actually the easiest category for detectors, which makes sense since faces still have tells.

@henry.nomad used the free tiers for everything to keep it fair. good point about paid tiers tho, might be worth a round 2 with those

sleek. Protocol: all tested as-is from the source files, no recompression. but yeah that’s a whole other variable worth testing. added to the list

spreadsheet should be ready this weekend, ill post it here