It’s getting a bit ridiculous out here. I’m using DuckDuckGo but since it aggregates its search from other sources, it’s also gotten bad recently. Is there a search out there that blocks domains that spam AI? Extra points if there’s something like Ublock Origin that filters things based on a community-made list.
Edit: I’m aware of Kagi but it’s pretty expensive and I’m not a fan that they, too, host their own AI tools.
in this github there is a bit of fuckery with the link notice he has an affiliate link for adblock plus instead of linking to the goddamn host list directly.
Do I just copy and paste this into my filter lists?
I wouldn’t, there are plenty of filter lists right from ublock itself which I trust more.
Doing the lord’s work
Looks good, thank you
Well I’m installing this and you’re my new favorite person.
Search is eventually going to be so enshitified that the way to actually find out things is going to fall back on “ask someone you trust who knows things you don’t”. At least by that point those trustworthy people should be better informed than in the past…
Maybe we will see the return of lists like what Yahoo was.
It’s ultimately self-defeating as well because any future AI is going to be polluted by past AI’s garbage content. Making it even harder to develop intelligent AI systems.
It can survive well where there’s editorial control. I’d talk to an AI if it had only read encyclopedias for example…
I tried doing some of this. I trained on a corpus of data I wanted it to read, with such a small amount of training data, I found it was overall too lossy. If I asked it a question about something that was in there and it responded there was a really good chance that it was in there. But there was a lot of not knowing something that was definitely in there. It wasn’t completely useless but I wouldn’t say that it was at the level of being truly helpful.
I worry that there’s not enough verified data out there to set up for proper training.
I suspect such a model would have to be far more attuned to its data being smaller but trustworthy. Something like chatGPT for example requires a huge volume because it’s weakly affected by any particular datum going in. It’s designed to adapt to general conversation norms, rather than specific facts. If you could take a generalist like chatGPT and combine it with an expert model that’s been told everything it’s told has a huge weighting then that would probably be a big step forward.
I use uBlacklist to filter out stuff from search results, it works with a bunch of search engines. It has various lists you can subscribe to, also including anti-AI ones.
Looks super cool. Too bad they don’t have a way to add custom SearX instances other than modifying and building the extension yourself.
I think the best way to make the Internet less sh*tty is to get away from Google search.
I like the SearX search engine. It gives old-school, relevant search results, not google ranked ones.
It’s also spread out over many separate instances, so you can pick the one that best suits your search needs:
Oh damn. That was like a proper internet 2.0 kinda experience. A feller could get used to that.
I selfhost it on my laptop, pretty easy, and I always have it just the way I want it. Still pushing shit uphill with the AI crap, but better than any one search engine (it amalgamates many). Relevant to OP I have a large block list enabled, but it’s very much a moving target.
I’ve had good luck as a back up to Duck Duck Go with Mojeek. It’s so old school, it doesn’t always know what you want, but I sometimes want that.
I’ve found Mojeek to be a bit hit and miss; but one thing I really appreciate is that they actually do the indexing and searching themselves (whereas pretty much every other search site uses Bing or Google behind the scenes). So although Mojeek may not be ideal, they are at least making an effort to be independent.
It won’t block them, but I started to feel like recently DDG’s results were awful. I couldn’t find simple things. I’ve switched to startpage and had a much better experience. The results feel more aligned with what I want and I feel like there’s less crap. Its probably confirmation bias hah, but its working.
This is essentially comparing Bing to Google: https://www.searchenginemap.com/
This does not work well on WebKit.
Ohh what a neat page! I was unaware, thank you!
Man, i was looking up info about arrow rests for recurve/olympic archery yesterday and stumbled on a website that use some sort of AI fever dream for their images.
One kinda looked like a violins neckbrace (i don’t know what those things are called) with some strings attached, but it looked like it should look like a thing but after closer inspection it was actually nothing sensible.
I think we’ve all seen those images that look like a room filled with itema but when you look at a specific item your mind figures out it’s just weird shapes and colors.
What a nightmare that was.
I had a similar experience today looking up beer bongs. Some real cthulu type of shit.
Kagi! You can block websites so they don’t show up. It’ll also flag websites that contain a lot of spam or ads.
Kagi lets you blacklist individual domains yourself, but I think what OP is asking is “is there a search engine that identifies and blacklists AI generated content itself”.
I think that the answer is probably that yes, probably all search engines try to block spam websites of any sort, AI-generated or no, and will do so all the time, or at least downrank them. Trying to present relevant, useful material at the top of the results is basically the business that search engines are in.
Now, do any do so to a level sufficient to fully eliminate them? I’d guess not. SEO spammers have been trying to pollute top results with their hits for about as long as search engines have been around, and trying to cheaply bulk-generate content that looks like something that the user might want is just the latest form this takes. My guess is that that’ll be a cat-and-mouse game for some time to come.
I think to claim to make an effort about that, I’m pretty sure I saw changelog about AI content detection at least for the image section.
Ah, gotcha, thanks.
Unless I need something recent whenever I search I update the results to dates from like 1999 to 2021. Filters out a lot of unnecessary crap.
I’m actually using searxng and just blocking any website with blatantly written AI shitcontent.
Ilused to use searx.be , had frequent down times and bad results.
Switched to startpage.com
I just host my own searxng instance. Bonus: I get to tweak the config to my liking.
I’m using Startpage too, but I have a feeling that the search result quality had a massive drop lately.
They use Google and Bing for their results last I checked. So, if those two get worse then Startpage will too.
I don’t know. Maybe Google is actively limiting results for Startpage. When I don’t find what I’m looking for in Startpage I switch to Google and boom, adequate search result. I refuse to permanently go back to Google though.
Could be that. Could also be the fact that Google adjusts results based on their profile of you, giving different results from Startpage.
Have you given Kagi an actual shake? If you are not interested in saving preferences longer term, you can keep cycling through free accounts. Now more than ever, it is a breath of fresh air. If I want a quick AI answer without scrolling through some ad-ridden web page, I just put a “?” at the end of my query. If not, I have no AI garbage on my results.
I love kagi but I don’t think it actively filters out ai generated content.
I know when searching for pictures you can disable AI generated images.
I think the hard part for a search engine is that unless there is some kind of identifying mark on the content, how do they know that an ai didn’t write a top 10 list of pastebin alternatives?
It’s not immune to it. If you are looking for something highly specific you will get slob for sure. To give an actual example, a buddy of mine told me that the walls of your house act like a sponge when you have the outer walls insulated but not the basement walls on the outside, at least against water. So I went looking on kagi for stuff to back that up (not that I didn’t believe him, I just wanted to know more). A lot of the results were completely ai generated crap websites. There were good and somewhat relevant results, but in the end I gave up (also because we got confirmation that it’s done on our house, so it became irrelevant).
Try Mojeek - https://mojeek.com/