geekhack

geekhack Community => Off Topic => Topic started by: Dihedral on Tue, 23 August 2016, 05:29:36

Title: [Meta] Geekhack Classifieds Scraping - looking for opinions
Post by: Dihedral on Tue, 23 August 2016, 05:29:36
Hey guys,

I've been working on a little system to improve the geekhack classifieds and would like your feedback.

This system scrapes the classifieds and compiles the information it finds into a formatted catalogue of WTS, WTB and WTT items.

It looks for little tags like these which can be added to classifieds OPs and uses them to form its catalogue:

Code: [Select]
[color=transparent]<<<[["WTS", "Dell QuietKey", "$20"], ["WTB", "NMB RT101+", "$80"], ["WTT", "Matias Quiet Click x200", "Cream Damped Alps x50"]]>>>[/color]
This line is a JSON list. Each item in the list is itself a list, representing a listing, with three values - the category (WTS, WTB, WTT), the item, and the price. Price is a a kinda vague concept - for WTB it means the budget of the user, and for WTT it means desired items.

Only one of these lines can be present in any single OP. The tags <<< and >>> signify to the program the location of the JSON and the color tags are not necessary but are simply there to stop the JSONs from cluttering an OP.

The program automatically reads all the posts in the classifieds and turns these JSON lines into a formatted set of tables like below:


WTT
    ITEM   
    PRICE   
    USER   
    TOPIC   
Matias Quiet Click x200    Cream Damped Alps x50    Dihedral    Topic Link    (http://geekhack.org?topic=84256.0)



WTS
    ITEM   
    PRICE   
    USER   
    TOPIC   
Dell QuietKey    $20    Dihedral    Topic Link    (http://geekhack.org?topic=84256.0)



WTB
    ITEM   
    PRICE   
    USER   
    TOPIC   
NMB RT101+    $80    Dihedral    Topic Link    (http://geekhack.org?topic=84256.0)




What are your thoughts? Is this a worthwhile system that should be deployed. What improvements can be made to it. If you want to see the code I am happy to dump it into a GitHub, just ask.
Title: Re: Geekhack Classifieds Scraping
Post by: Dihedral on Wed, 24 August 2016, 05:54:16
 :blank:
Title: Re: [Meta] Geekhack Classifieds Scraping - looking for opinions
Post by: Dihedral on Wed, 24 August 2016, 09:14:30
 :blank: Off Topic posts get buried quickly
Title: Re: [Meta] Geekhack Classifieds Scraping - looking for opinions
Post by: Bromono on Wed, 24 August 2016, 09:18:00
What language are you using?
Title: Re: [Meta] Geekhack Classifieds Scraping - looking for opinions
Post by: Dihedral on Wed, 24 August 2016, 09:23:34
What language are you using?

It's all implemented in Python. A large chunk of the code is a library I wrote to allow easy writing of Python scripts which interface with Geekhack, similar to praw for reddit. The rest of it handles the actual cataloging of data. Hopefully I will be able to use the library again for any future projects.