Whenever I talk about crowd-sourced transcription–actually whenever Italk about crowdsourced
–the first question people ask is aboutaccuracy. Nobody trusts the public add to an institution’s data/meta-data,nor especially to correct it. However, quality control over data entry is awell-explored problem, and while I’m not familiar with the literaturefrom industry regarding commercial approaches, I’d like to offer thesystems I’ve seen implemented in the kinds of volunteer transcriptionprojects I follow.
(Note: the terminology is my own, and may be non-standard.)
(mainly employed with large, prosy textthat is difficult to compare against independent transcriptions of the same text). In these methods, all changes and corrections aremade to a single transcription which originated with a volunteerand is modified thereafter. There no parallel/alternate transcriptionto compare against.
Open-ended community revision:
This is the method thatWikipedia uses, and it’s the strategy I’ve followed inFromThePage.In this method, users may continue to changethe text of a transcription forever. Because all changes arelogged–with a pointer of some sort to the user who loggedthem–vandalism or edits which are not in good faith may bereverted to a known-good state easily. This is in keeping withthe digital humanities principle of “no final version.” In myown projects, I’ve seen edits made to a transcription twodecades after the initial version, and those changes were
indeed correct. (Who knew that “drugget” was a coarse fabricused for covering tobacco plant-beds?) Furthermore, Ibelieve that there is no reason other than the cost of implementation why any of the methods below which operatefrom the “final version” mind-set should not allow errorreports against their “published” form.2.
Fixed-term community revision:
Early versions of bothTransribeBentham andScripto followed this model, andwhile I’m not sure if either of them still do, it does seem toappeal to traditional documentary editing projects that areincorporating crowdsourcing as a valuable initial input to aproject while wishing to retain ultimate control over the”final version”. In this model, wiki-like systems are used togather the inital data, with periodic review by experts. Once atranscription reaches an acceptable status (deemed so by theexperts), it is locked to further community edits and thetranscription is “published” to a more traditional medium likea CMS or a print edition.3.
any other project I’m aware of.) If a transcription is rejected,it may be either returned to the submitter for correction orcorrected by the expert and published in corrected form.B.
(mainly employed with easily-isolated,structured records like census entries or ship’s log books). In all of these cases, the same image is presented to different users to betranscribed from scratch. The data thus collected is comparedprogrammatically on the assumption that two correct transcriptionswill agree with each other and may be assumed to be valid. If thetwo transcriptions disagree with each other, however, one of themmust be in error, so some kind of programmatic or human expertintervention is needed. It should be noted that all of thesemethodologys are technically “blind” n-way keying, as thevolunteers are unaware of each other’s contributions and do notknow whether they are interpreting the data for the first time orcontributing a duplicate entry.6.
Triple-keying with voting:
This is the method that theZooniverseOldWeather team uses. Originally theOldWeather team collected the same information in tendifferent independent tracks, entered by users who wereunaware of each other’s contributions: blind, ten-way keying.The assumption was that majority reading would be thecorrect one, so essentially this is a voting system. After someanalysis it was determined that the quality of three-waykeying was indistinguishable from that of ten-way keying, sothe system was modified to a less-skeptical algorithm, savingvolunteer effort. If I understand correctly, the same kind of voting methodology is used by ReCAPTCHA for its OCRcorrection, which allowed its exploitation by 4chan.
Double-keying with expert reconciliation:
In this system,the same entry is shown to two different volunteers, and if their submissions do not agree it is passed to an expert forreconciliation. This requires a second level of correctionsoftware capable of displaying the original image along withboth submitted transcriptions. If I recall my fellow panelistDavid Klevan’s WebWise presentation correctly, this systemis used by the Holocaust Museum for one of theircrowdsourcing projects.8.
Double-keying with emergent community-expertreconciliation:
This method is almost identical to theprevious one, with one important exception. The experts whoreconcile divergent transcriptions are themselves volunteers — volunteers who have been promoted to from transcribers toreconcilers through an algorithm. If a user has submitted acertain (large) number of transcriptions, and if thosetranscriptions have either 1) matched their counterpart’ssubmission, or 2) been deemed correct by the reconcilerwhen they are in conflict with their counterpart’stranscription, then the user is automatically promoted. Afterpromotion, they are able to choose their volunteer activityfrom either the queue of images to be transcribed or thequeue of conflicting transcriptions to be reconciled. This isthe system used by FamilySearch Indexing, and its emergentnature makes it a particularly scalable solution for qualitycontrol.9.
Double-keying with N-keyed run-off votes:
Nobodyactually does this that I’m aware of, but I think it might becost-effective. If the initial set of two volunteer submissionsdon’t agree, rather than submit the argument to an expert, re-
queue the transcription to new volunteers. I’m not sure whatthe right number is here — perhaps only a single tie-breakervote, but perhaps three new volunteers to provide anoverwhelming consensus against the original readings. If thisis indecisive, why not re-submit the transcription again to aneven larger group? Obviously this requires some limits, orelse the whole thing could spiral into an infinite loop inwhich your entire pool of volunteers are arguing with eachother about the reading of a single entry that is trulyindecipherable. However, I think it has some promise as itmay have the same scalability benefits of the previousmethod without needing the complex promotion algorithmnor the reconciliation UI.
Some things are simply not knowable. It is hard to evaluate theeffectiveness of quality control seriously without taking into account thepossibility that volunteer contributors may be correct and experts may bewrong, nor more importantly that some images are simply illegibleregardless of the paleographic expertise of the transcriber. TheZooniverse team is now exploring ways for volunteers to correct errorsmade not by transcribers but rather by the midshipmen of the watch whorecorded the original entries a century ago. They realize that a mistaken”E” for “W” in a longitude record may be more amenable to correctionthan a truly illegible entry. Not all errors are made by the “crowd”, after all.