Ask a guy who archives everything anything?
Pic probably related
It's a web service using SQL, I don't have it in the form of an executable unfortunately. It also has proprietary image detection to identify bodies to remove any explicit images that get archived.
I assumed 2 and 3 character words would get cut out of the search for performance reasons, that's how i'd do it to get faster results anyway(tip: count the number of them on my post)
This is my income, no.
The number of non-oc OC threads that people believe are OC which aren't are upsettingly high. Also, the general frequency of new posts do not change much between summer and any other season.
I see threads all the time where guys are trying to shoot bigger loads. If someone had developed a pill that let you shoot massive wads every time, I feel like marketing it towards /b/tards would be a great tool.
You'd be surprised.
>"The stories and information posted here are artistic works of fiction and falsehood."
Heaven you never cease to amaze me.
It's getting a bit warm in here.
Yup, they're mostly porn though. They actually cause quite a bit of trouble since it's hard to gauge whether they are explicit or not because it's not as easy to identify content like you can in a static image.
I will be wainting fo you then. Meanwhile, do you think there is a pattern in making subjects or replying ? I always thought that /b/ is always about the same things again and again.
Was this thread removed by a moderator? I'm finding references to it but the thread has been removed from the DB, if there was a substantial amount of explicit stuff or a moderator deleted the threat, the likelihood is the system purged the thread from the DB to avoid storing CP and other stuff we could get in trouble for.
If you have the title of the thread I can have another look.
i like to expand on this, as i have quite a collection of pictures that i mostly saved off 4chan.
do you know hydrus network client? it's the tagging program that i use and it makes finding relevant pictures much easier.
how do you find shit in your programm? do you 'tag' roll threads as such? what would you need to do to get a list of all spaghetti greentext stories from the last month?
do you organise pictures for private use?
and another thing: could your system be used to reliably remove recurring threads like (for example) gay furry threads? they almost always have another OP text + pic, so what's the criteria here? dont want to set up a filter that hides meta-discussion about those threads.
There was someone who used to post under the name "Jack Ryan" or "House" with pictures of Ben Affleck and Gregory house on /b/ when you used to be able to post with usernames... can you give me any information on this person or the last time they were around and where?
Images still maintain a relationship with the parent thread. Because of this, images can be pulled up if you search by a string which exists from within the parent thread. It can also find duplicate images and give statistics on how often an image has been reposted, this is great for determining whether something is actually OC, but this can take a while.
To get threads with greentext stories would be quite difficult to do quickly, but it is generally as simple as making a SQL query to fetch posts that have greentext (the DB maintains the html formatting/spans) and then pull up all the parent threads.
I do organize some pictures for private use, it's quite fun to flip through them all.
About removing furry threads, I'm not too sure, you could perhaps detect images which have been reposted from threads known to be furry threads but otherwise it's risky to do this automatically as you may end up losing an interesting thread with just one furry picture posted at random.
how does image detection work? you've mentioned that you can filter nude pictures. is it failsafe? what about things like this? does it use nipple detection?
I lol'd. I'm not the one responsible for programming most of the image detection, but I have a vague understanding of how it works. As far as I know it attempts various methods of detection, ranging from forming a basic skeleton, detecting skin colours and faces, if any one of these flags up anything even slightly then it will remove the image. It's extremely sensitive so there are lots of times where it will delete something not even remotely sexual, this can be seen in wallpaper threads. But it is a fine price to pay for essentially not saving CP and having to worry about that.
I can't find the image you posted in the DB so I should imagine it detected it.
do you have any traces of mod works? do you know when a thread is being forcibly removed instead of just 'expiring'?
as a lurker, you don't really see much being removed and if you do then it's pure luck. could your tools be useful for mod work?
also didnt there used to be an official 4chan archive? i know that i used the filenames there to gauge the date that i started lurking.
Try 300 mil "Get!" - Pic related
Sometimes a thread disappears moments seconds after a post was made which suggests it can't have 404'd. Usually this is because of mod intervention, I think I have a few examples actually where a user uploads a picture of someones credit card, then the thread gets removed promptly, I will try and find it.