geekhack

Site Announcements and Feedback => Announcements/Feedback/Suggestions => Topic started by: 1391401 on Sat, 21 July 2012, 21:35:15

Title: "Reconnecting" with the rest of the web
Post by: 1391401 on Sat, 21 July 2012, 21:35:15
I'm not sure if this has been said, but would a MOD_REWRITE or whatever is necessary work for reconnecting our content to the rest of the net?

Looks like you guys did a great job making sure the ordinal values of threads pair up:

From Google: http://geekhack.org/showthread.php?32274-Recommend-me-a-Trackball-please
Thread now: http://geekhack.org/index.php?topic=32274.0

From Google: http://geekhack.org/showthread.php?28324-Adopt-a-Keycap-Legend-with-Cherry-Replica-Font!-(Cherry-font-not-some-other-font)
Thread now: http://geekhack.org/index.php?topic=28324

Thanks for all the time, effort, and headache put into this! :)
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Sat, 21 July 2012, 21:46:31
The post numbers match up as well.

old: http://geekhack.org/showwiki.php?title=Island:17458&p=362823&viewfull=1#post362823
new: http://geekhack.org/index.php?topic=17458.msg362823#msg362823

old: http://geekhack.org/showthread.php?10629-What-did-you-get-in-the-mail-today&p=247450&viewfull=1#post247450
new: http://geekhack.org/index.php?topic=10629.msg247450#msg247450
(I don't think that one works, the thread is gone).
Title: Re: "Reconnecting" with the rest of the web
Post by: mkawa on Wed, 25 July 2012, 04:08:42
hmm.. i'm not crazy about url rewriting, i think that's a rathole we don't want to get into. that said, i noticed that the previous urls had topic names in them, which is an seo technique for increasing relevancy, and could speed up re-indexing (because frankly that's how most people find content these days). SMF has a similar toggle that can gussy up URLs; i'll try flipping it on now and seeing if that helps things a bit.

edit: technical complications prevent this. will have to give this issue more thought.
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Wed, 25 July 2012, 05:46:19
hmm.. i'm not crazy about url rewriting, i think that's a rathole we don't want to get into.
What worries you? SMF doesn't seem to use /showthread.php or /showwiki.php, which would of course be a problem if they did. (edit: actually, I'm not sure even that would cause a problem, since there's also topic vs thread and msg vs post to disinguish them).

that said, i noticed that the previous urls had topic names in them, which is an seo technique for increasing relevancy, and could speed up re-indexing (because frankly that's how most people find content these days). SMF has a similar toggle that can gussy up URLs; i'll try flipping it on now and seeing if that helps things a bit.
Would URLs without topic names in still work as well?

edit: technical complications prevent this. will have to give this issue more thought.
Prevent which? Adding topic names, or redirecting URLs? :)

Redirecting URLs wouldn't just connect us back to the web, it would also fix links between threads. And I think there are a remarkable amount of those!

I think I'll try writing a greasemonkey script to patch the URLs. Obviously it would be 100x better to do this on the server! But greasemonkey will be a good place to test out the regular expressions required.
Title: Re: "Reconnecting" with the rest of the web
Post by: 1391401 on Wed, 25 July 2012, 20:35:01
Maybe it's not a mod_rewrite or some other mapping, maybe it's a custom error page for 404 that as a last-ditch effort attempts to resolve a URL based on a pregmatch?

edit: one downside to that might be the 4xx error code sent back to the browser and google, may not have the intended result...

edit2: what stops us from building a showthread.php page that parses the '32274-Recommend-me-a-Trackball-please' portion of the URL and sends users to index.php?topic=32274.0
Title: Re: "Reconnecting" with the rest of the web
Post by: fartq on Wed, 25 July 2012, 22:03:19
what stops us from building a showthread.php page that parses the '32274-Recommend-me-a-Trackball-please' portion of the URL and sends users to index.php?topic=32274.0

nothing. this is the way to go. can you throw that into feature requests?
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Wed, 25 July 2012, 22:41:56
Heh. That's what I said, somewhere. Trouble is, even I can't remember where now!

Same trick for showwiki.php as well (it wasn't just wiki, it was mods/reviews/etc as well).

Any others?
Title: Re: "Reconnecting" with the rest of the web
Post by: mkawa on Wed, 25 July 2012, 22:51:42
well regardless someone please throw a link to this thread into the feature requests thread and explain to future me what needs to be done :P
Title: Re: "Reconnecting" with the rest of the web
Post by: 1391401 on Thu, 26 July 2012, 15:33:01
Yeah, not 100% sure but it looks like the wiki, and reviews were stored in this way:

http://geekhack.org/showwiki.php?title=Island:31686
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Thu, 26 July 2012, 16:19:25
Shall we start building a list of all the variations to test?

Some links contain a page number as well. As long as there is a post number, it can just be ignored...

old: http://geekhack.org/showwiki.php?title=Island:17458&viewfull=1&page=43&do=comments#post609833
new: http://geekhack.org/index.php?topic=17458.msg609833#msg609833

I can't remember if there were URLs which had a page number but not a post number. There probably weren't many of them saved anyway, so we could translate them using the 'start' number that appears after the thread number in SMF...

old: http://geekhack.org/showwiki.php?title=Island:17458&viewfull=1&page=43&do=comments
new: http://geekhack.org/index.php?topic=17458.430

... assuming 10 posts per page, 10 * 43 = 430. I think wiki was 10 posts per page, and normal threads were 15.
Title: Re: "Reconnecting" with the rest of the web
Post by: iMav on Sat, 28 July 2012, 04:23:05
I typically have never worried about google.  Eventually, google sorts things out and indexes the new content.

We can certainly explore doing this...but, IMHO, it is a waste of effort.  If we do absolutely nothing, time will solve the issue.
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Sat, 28 July 2012, 06:25:21
It's not just google though, it's people's saved bookmarks as well.

And most usefully, it would fix links in posts to other posts/threads.

edit: here's a fine example (http://geekhack.org/index.php?topic=32817.msg620502#msg620502)  ;D
Title: Re: "Reconnecting" with the rest of the web
Post by: 1391401 on Sat, 28 July 2012, 17:23:42
I agree that I worry less about google as it is already indexing threads here, but what about other pages on the net?  One thing I noticed in my down time here is that a LOT of content sits here and lots of other sites link here.  That knowledge is now lost in a way.
Title: Re: "Reconnecting" with the rest of the web
Post by: rknize on Tue, 31 July 2012, 15:21:14
OK, I added showthread.php glue for starters.  You'll get a 301, which should help the search engines along.

Edit: now have showwiki.php too.  Obviously, it only works for articles that were not part of the actual old wiki (reviews, etc).
Title: Re: "Reconnecting" with the rest of the web
Post by: Soarer on Tue, 31 July 2012, 17:15:02
Lovely! Thanks for spending the time!

I agree, 301 is the right way to do it. I'm not sure if it matters, but is there another code we could use until it's mostly finished?
(It would be undesirable for them to update links incorrectly! The .msgNNN#mdgNNN thing isn't working for showwiki, but is for showthread :) )
Title: Re: "Reconnecting" with the rest of the web
Post by: rknize on Tue, 31 July 2012, 18:14:52
OK...I see.  it's the page number.  I disabled the 301 until I fix it later tonight.
Title: Re: "Reconnecting" with the rest of the web
Post by: rknize on Tue, 31 July 2012, 21:12:23
OK, I think it works now.  It's not possible to get the exact post when p= is not set.  The URL after the # is not passed to the server.
Title: Re: "Reconnecting" with the rest of the web
Post by: fartq on Tue, 31 July 2012, 21:38:25
thanks rknize!