As a foreword, I will mention that I only got invested in this since, like, last week. I knew the wiki was about to be shut down, but I assumed that people had been working on bringing the mirror up to date. This does not seem to have been the case - I'm not exactly sure when the efforts were stopped. Apparently, Miraheze themselves are supposed to transfer database files, yet for some reason the website is very incomplete, leading me to believe that either Miraheze stopped updating, or that the people who undertook the original task did so manually, and gave up halfway.
This isn't exactly made easier by the fact that no one really seems to know who actually created the Miraheze thing in the first place; I'm in contact with one of the (Fandom) Mental Block admins, but they don't seem to have administrative permissions on Miraheze. Furthermore, they told me apparently someone had been harvesting images from Fandom, but they don't know who it is.
To summarize, it's kind of a clusterfuck, and I'm a newcomer looking to make sense of all this basically one week before it shuts down, with no prior knowledge of database transfer or wiki handling. Hell, the wiki
was shut down; it's only because I sent an email to support and they agreed to give us one more week that it's currently up still.
You're all free to go check out <<
www.mcforum.net/yabbse/index.php?topic=42529.0|the thread on MC Forum>> (requires an account) if you're curious, though.
anonlv000 said:
Right, I forgot that this was the point of this thread. Is everyone currently helping this effort doing this by hand, or has someone already scraped the wiki partially with a script, or...?
I'm guess what I'm trying to say is: could you clarify what exactly needs to be saved? Can you bring us up to speed on the problem and existing approaches to solutions?
So, we already have complete database files (.XML), but the issue is that the Miraheze site has already been worked on a little, which according to both them and Fandom is likely to cause errors in page names and histories should the database be uploaded on top of the existing new pages. Hence the manual effort.
anonlv000 said:
Is everybody currently working on this doing this by hand?
Well, "everybody" seems to be a limited number of people right now. I haven't seen anyone work on this apart from me - in the <<
mentalblock.miraheze.org/...hanced=1&urlversion=2|Miraheze recent changes page>>, I'm one of, like, two accounts currently reported as having done any changes in the past couple days.
anonlv000 said:
I know that there are a few among the MC community who can do at least some software. Assuming that they've already looked at the problem, how come trawling the website doesn't work? If it doesn't work, what doesn't it catch?
Both Miraheze and Fandom have a pretty easy way (.xml databases) to transfer page/history content between each other, but I already explained why I think the database method is probably a bad idea. That said, if someone that's a bit savvier than me wants to have a go at that, I don't mind. I have a Drive link to the most recent database files <<
drive.google.com/drive/fo...Iv5tqvSuxJ6Zm?usp=sharing|here.>>
anonlv000 said:
If I were to write a script to download the wiki, what would I need to scrape? The first post from the wiki indicates that images need to be downloaded separately, but then TheMadPrince implies that the images are already downloaded, and it's actually the text pages that need to be downloaded. Is that because all the images are all backed up already?
Images aren't downloaded (or at least, most of them aren't). I'm currently trying to use Wget to do it, but if anyone has a better idea I don't mind handing it over to them. I'm trying to say that if we manage to download the images, we can then upload the ones missing separately, even after the Fandom website shuts down. They still need to be downloaded, though.
As for what "needs to be scraped", it's mostly just the images and the pages that haven't been put in yet.
Again, the Miraheze site actually has more content than I thought ; but there's still quite a bit missing.
anonlv000 said:
Are comments any kind of priority? If someone already has a partial solution in place, then where's the code and examples of any pages that said solution missed?
I haven't seen any way to comment on Miraheze, so I don't know if it's even worth trying to save comments. No code right now (or at least, none that I have access to); as mentioned, just database files and manual labor.