Author Topic: Geekhack Thread Image Downloader  (Read 3081 times)

0 Members and 1 Guest are viewing this topic.

Offline cooldiscretion

  • Thread Starter
  • Posts: 747
  • Location: Seattle, WA
  • Convicted guilty of being totally rad.
Geekhack Thread Image Downloader
« on: Sun, 06 December 2015, 13:07:48 »
I'm not sure if this is the right category to post this, but if it's not maybe someone can move it.


Over the last few weeks I wrote a Python script that will download all images in a particular thread on Geekhack and generate
a report file with URL links to where the image was first encountered in the thread. It started because I wanted to use all the
images from the 'post your clacks' thread to be randomly displayed for my screensaver at work. Then, as I was looking at the
images that were downloaded, I wanted to know who posted them and where so I modified the script to create a report file.
The script will do it's best to find situations where there are duplicate images and squash them from being downloaded multiple
times. It doesn't download gifs, but it cannot currently distinguish between meme jpegs that have been uploaded or valid pictures.


I am in no way an amazing Python developer as I normally write C for my day to day job, but I've created a Github page with
instructions on how to use the script here:


https://github.com/stevegcarpenter/geekhack_image_downloader


Any feedback on how it might be improved or where the instructions aren't clear enough would be well appreciated. And of course,
I hope it serves the purpose for others that it did for me.

Offline Computer-Lab in Basement

  • The needs of the many outweigh the needs of the few.
  • * Elevated Elder
  • Posts: 3025
  • Location: NCC-1701, USS Enterprise
  • Live long and prosper
Re: Geekhack Thread Image Downloader
« Reply #1 on: Sun, 06 December 2015, 13:47:14 »
will this download all of the gifs?
tp thread is tp thread
Sometimes it's like he accidentally makes a thread instead of a google search.

IBM Model M SSK | IBM Model F XT | IBM Model F 122 | IBM Model M 122 | Ducky YOTD 2012 w/ blue switches | Poker II w/ Blue switches | Royal Kludge RK61 w/ Blue switches

Offline retrochick

  • Posts: 600
  • goodbye my wallet
Re: Geekhack Thread Image Downloader
« Reply #2 on: Sun, 06 December 2015, 13:53:56 »
don't think so.


Cherry is love. Topre is life. ~raymogi

Offline cooldiscretion

  • Thread Starter
  • Posts: 747
  • Location: Seattle, WA
  • Convicted guilty of being totally rad.
Re: Geekhack Thread Image Downloader
« Reply #3 on: Sun, 06 December 2015, 14:24:05 »
will this download all of the gifs?

It won't download gifs. but the script could be modified to only download gifs or even to specify what types of files to be downloaded I suppose.

Offline azhdar

  • Praise the AZERTY god
  • Posts: 2435
  • Location: France
  • 65% Enlightened
Re: Geekhack Thread Image Downloader
« Reply #4 on: Sun, 06 December 2015, 14:28:29 »
would it work to download from the posts of a user, for example : https://geekhack.org/index.php?action=profile;area=showposts;u=36817
Azerty Propagandiste

Offline cooldiscretion

  • Thread Starter
  • Posts: 747
  • Location: Seattle, WA
  • Convicted guilty of being totally rad.
Re: Geekhack Thread Image Downloader
« Reply #5 on: Sun, 06 December 2015, 15:54:24 »
would it work to download from the posts of a user, for example : https://geekhack.org/index.php?action=profile;area=showposts;u=36817

Unfortunately, it won't currently. A thread is only really comprised of two numbers that are important which I extract from the URL address.

The first number comes after topic= inside the url address. That is the number which identifies the thread itself. And, the second number follows a
period after the topic number - it is the page number. However, the page numbers go in increments of 50, which is why page 1 of a thread should end
int 0 and page 2 would end in 50. I'm guessing support could be added to actually download images from a users posts although that could potentially
be something others wouldn't like for moral reasons of harvesting all data from a particular user.


EDIT: For example, this thread has the topic number 77652 and since this is page 1 it is followed by page number 0.


Hence, URL address would be:
https://geekhack.org/index.php?topic=77652.0


The same strategy could be used for a users posts though. The only unique data in the URL address is the user profile number (mine appears to be 36817)
followed  by the page number of said users posts which start at 0 and increment by 50 each page. If you look at page 2 of my posts, you can see this:
https://geekhack.org/index.php?action=profile;u=36817;area=showposts;start=50


This same format can be used to access all pages including page 1, just modify the URL to be:
https://geekhack.org/index.php?action=profile;u=36817;area=showposts;start=0
« Last Edit: Sun, 06 December 2015, 16:03:07 by cooldiscretion »

Offline trenzafeeds

  • * Exquisite Elder
  • Posts: 1352
  • Location: vt
  • **** off
Re: Geekhack Thread Image Downloader
« Reply #6 on: Sun, 06 December 2015, 17:57:16 »
This should probably be in suggestions/improvements, or whatever that subforum is called.
demik will never leave.

Unless he gets banned.

Offline trizkut

  • * Global Moderator
  • Posts: 1207
  • Location: MA
Re: Geekhack Thread Image Downloader
« Reply #7 on: Sun, 06 December 2015, 18:09:47 »
This should probably be in suggestions/improvements, or whatever that subforum is called.
Why? This has nothing to do with modifying GH in any way


Offline trenzafeeds

  • * Exquisite Elder
  • Posts: 1352
  • Location: vt
  • **** off
Re: Geekhack Thread Image Downloader
« Reply #8 on: Sun, 06 December 2015, 18:10:55 »
This should probably be in suggestions/improvements, or whatever that subforum is called.
Why? This has nothing to do with modifying GH in any way

I know... it just seems out of place here. Doesn't really matter at all though!  :thumb: Quite a cool script.

Actually now that I think about it, it could go in MST. Again, totally not an issue though, I don't actually care lol.
demik will never leave.

Unless he gets banned.

Offline cooldiscretion

  • Thread Starter
  • Posts: 747
  • Location: Seattle, WA
  • Convicted guilty of being totally rad.
Re: Geekhack Thread Image Downloader
« Reply #9 on: Thu, 24 December 2015, 18:10:12 »
I just updated the script to accept a start and end page to download images.  This means people won't be forced to download all images from a particular thread
when they are just interested in the last 20 pages or so.


To all you lurkers and active members alike, you should use this over the holidays and tell me about any problems or improvements you like. If people really like the idea of downloading
images from a particular users posts, I can add that feature too.
« Last Edit: Thu, 24 December 2015, 18:16:46 by cooldiscretion »