(source) C. Scott Ananian: Ideas/On Resilience

These are notes from a teleconference discussion the parsing team had on "Resilience", which is one of the theme topic areas for the 2018 WMF Technical Conference. Apologies for loose structure.

Broader topic area is "Scale": Identify where in the wiki lifecycle each wiki is in, try to adapt to that:

Mako's research re: lifecycles, confirming research on wikia, etc.

Further thoughts:

"What is the greatest potential threat...?"

"Resilience vs resistance to change?"

"Served in a data-efficient, useful, and reliable manner"

Language translation technology is one approach at bridging gaps between mature and less-mature wikis. The low-content wiki can get translations of general-interest articles while contributing back articles on its particular area of the world/culture/etc.

(From Arlo): Think of resilience like health": an organism can only sustain so much stress before it starts to decline.

Useful framing question: resilience against what?

Can be useful just to enumerate these threats, to determine if there are possibly some common strategies to combat many threats, instead of dealing with them individually.

We need to hold "centralized" and "decentralized" in balance. Decentralization increases resilience but harms scale, and vice-versa.

Concretely:

1. Global templates

2. Article templates.

3. Translation, including machine translation, as a means of increasing scale

Anti-censorship / distribution tools

I've already written about offline editing queues as a mechanism to enhance access from challenging areas:

Edit Conflicts, Offline Contributions, and Tor: Oh my!

This would be a useful area in which to deploy prototypes and pursue active research, for example on privacy-preserving reputation systems. This could be done over the subset of Tor-using Wikimedians, so that "failed experiments" don't adversely impact the social processes of our larger community. Permission to fail!

Note that there are three fundamental conflicts in play:

  1. Immutable signed/attested content -vs- "encyclopedia anyone can edit"
    • and don't forget "right to be forgotten", libel laws, biography of living persons, DMCA takedown, vandalism, etc
    • distributing content also means distributing the liability for content deemed outré in your particular legal regime
  2. Strong reputation system -vs- protecting identity of editors
    • say, in repressive regimes, or against hate/bias/harassment
  3. "Every wiki is its own community" -vs- centralization and "scalability"
    • eg global templates, sharing workflows, etc
    • centralization also means agreeing on a single legal regime (but there may not be one single best regime)

Various "cryptocurrency"-themes proposals should be treated as high-risk proposals, on par with the way that (say) the idea that Wikipedia should use a github-like fork-and-merge model has been treated.

Regarding content distribution:

Regarding IPFS: https://twitter.com/cscottnet/status/1044241859676131330

How to protect privacy

[These are my notes from a conversation with a Wikimedian -- I think Greg Maxwell -- at Grendel's in Harvard Square on Jan 15, 2017. They seem to be naturally related to the other ideas on this page.]


Complete offline copies completely protect anonymity and article history

Tor editing: token scheme, ip->blind token. Fixed factor increase.

99% attention on the trolls who target admins

Compromise tools. Looking at read history, associate users

Pseudonyms not effective if you have ips

Library checkout records and Patriot act

Readers didn't need any privacy

Vandalism increase is good edits increase

What kinds of threats, what can be revealed by what you're reading

Jimmy knew that wikipedia was being captured in 2005, based on juniper docs

Rants of wikimedia-l on privacy

Detect interception. Deliberately poll from places out in the internet, check that (hash of) session keys are the same, to detect mitm attacks. Will cause state actors to not attack because they don't want to be detected. Solicit volunteers to be part of the "wikipedia security and privacy project".

State users can easily bypass stuff checkusers can see.

We discourage checkusers from going on fishing expeditions, which would turn up this stuff

Are there multiple editors who edited this article from the same IP. Bulk tools in some sense are more private. You don't reveal as much per user.

Site can be attacked by biasing article on networks you could control. Parsing article and embedding hash in comment, then run browser grease monkey stuff to check. But false positives on ad injection.

Wikimedia-l should be pushing for public policy. Eg against public propaganda directed at own citizens. No law preventing government targeting WP with edits. We use propaganda outside the US.

Troll army tries to bias, then to destroy. Make editors give up. Editors identities are more or less public. Harass anyone who edits any article, from so many different identities it doesn't look like a single person. And any supporters are attacked.

We avoid right now because people voluntarily decide to stay away from Israel/Palestine. Success gets measured by bad metrics, just if page is blanked often etc.

Only ten edit patrollers. Only thousand editors. Very vulnerable to targeted harassment.

Can't drive off the paid trolls, they are not emotionally invested. It can get good editors banned from the site by pushing emotional buttons. You just have to up your capacity.

Automation.

Ancient/long thread on edit access for Tor users: https://lists.wikimedia.org/pipermail/wikitech-l/2013-December/073764.html

C. Scott Ananian [[User:cscott]]