Secondary goal: gather support from external sources (chapters, Foundation) to make WS reach the potential it has.
Asaf: "Why now? Well, better late than never."
Wikisource was not created from a discussion, what did create Wikisource was pressure from Wikipedia, people was postings complete texts onto Wikipedia, but the community decided that those complete texts did not belonged to Wikipedia. This was a good solution for the Wikipedia problem, creating Wiksource, but is not sufficiently articulated.
Wikisources worked for years without an articulated vision, it's working great (because it's a wiki and it attracts people) and no major danger ahead.
Current status: WMF keeps it running; rarely, supports volunteers in doing something for it (like this conference).
This can continue. But if we want more institutional support (WMF), we need a strategic conversation.
What are we trying to acheive?
How does our work mesh with other digital libraries? Duplication (do we trust other website to still exist in the future?)
Multiple editions of a work.
What do we offer other than the text?
The text is in TEXT format, not an image scan, and it has been proofread. The text can therefore be put into speech software for the visually impaired.
Some texts have accompanying audio versions linked. (stored on Commons)
Wikisource is co-linking with Wkidata, so that bibliographic searches can be done, and coordinated with both national and international datatbases like VIAF, GND, BNF, and LOC.
We include Portals on the English Wikisource that gather topically related resources from books, book chapters, poetry, periodicals, and encyclopedia articles into a single organized locality.
Unlike other text file resources (e.g. Gutenberg), the original text can be viewed immediately for any page.
Translations of the same text in several languages are cross-linked between projects, making the Classics available in multiple languages.
Some original translations exist on Wikisource for works, including both old works for which there may not be modern translations, as well as news-worthy works such as government publications.
Some translations (e.g. Three Kingdoms) have parallel text with the original linked to entries on Wiktionary. This has led to additional Wiktionary entries.
Wikisource is being used for source citations on Wiktionary, with direct links from the Wiktionary entry for a word to the Wikisource document.
Likewise, Wikisource sometimes includes links in its texts to Wikipedia or Wiktionary.
Annotated editions?
That would make the project at risk to scatter, rather than staying focused on source transcription as faithfull to the orginal as possible. But the idea sure is interesting and would worth launching a dedicated project. Within Wikimedia we already have this culture to have dedicated projects to specific goals which are emerging from current activies, just like Wikisource emerged from a need felt in Wikipedia. See also the suggestion below to use Wikiversity.
Discussions about the text itself, a venue to discuss?
Should we have that ?
I think the best Wikimedia venue for book discussions would be Wikiversity, but then it'd help if we had more integrated systems.
Is it something we want to offer?
All this are the reasons why we are having this conversation today.
Florian Semle (https://twitter.com/floriansemle) will be the facilitator today. Help to structure the debate but it's an open process.
Different stages of Wikisource in countries:
Prestart: Doesn't exist right now or maybe it doesn't work right , but maybe should exist (field to explore)
Start: A few people have some pretty cool projects, but maybe not so much organization/not so many people. Some goals and focus already developed.
Working level: Have organization, community, working with outside organizations, gets along
Effective (former Professional): All the oportunities are taken and enough people work on it.
Then we have levels:
Organization
Community
Focus
Standards/Quality
It would be necesary to develop a transition between Prestart to Start, as well as Working Level to Effective, also, there is a goal: Create a vision/identity for all stages of organization.
Rough/tentative agenda:
Analysis of key challenges
Analysis of alternate sources
Where are conflict lines?
Analysis of ourselves with these results
Future opportunities
Identity and needs (what should we be, where should we go?)
Public perception of Wikisource: What is the feedback to your work?
Charles motivated the usage of Wikisource by Wikipedians, who "did not think about using Wikisource".
Lack: awareness?
Vigneron on the other hand points out that maybe the product of Wikisource is slow but it has quality. on the other hand, there is (or may be) lack of quantity. And personally, Vigneron likes this slow spirit.
Eduardo: they don't see a practical benefit of Wikisource. It doesn't shine. Hard to explain how it's useful.
Lack: how to present the data. We're used to seeing data in one way, and mobile view / mobile extension and pdf book downloader are both broken/ugly. Could be fixed, could upgrate user experience of accessing existing content. [compare: desktop view, mobile view, print view, pdf book]
Lack of statistic data, what is of public interest
Anika: If you're reading books, don't want to sit in front of computer (book lovers)
Lack: Doesn't work on mobile devices very well. Can't grab it.
List of sources related to a given topic - dewiki has something like this. But also don't want to complicate a text with too many templates/categories. (People come here to vacation from Wikipedia...)
Lack: Topical curation at the book level, not text-level annotations
Andrea: has heard "ws is boring", also difficult to navigate/contribute to; we do not produce new knowledge, only making available old knowledge. Transcribing old books is not sexy to many. And for many others, it's not even important to transcribe them. It has no impact. Wikipedia is MUCH more important.
Jim: There are GLAM institutions where transcriptions are something that they do internally, and while we are pushing for openess, in the end they have internal issues about this subject.
Brian: The teachers to whom I've shown it become excited. It eliminates the need for purchasing extra copies for students, and allows text searches for analysis.
Tobias
Idea: Test how useful wikisource is: hackathon for maps. Scan old-school map books, write square x locality. Strong link to Wikidata. Work on data-dense sources, data -dense sources are more useful as a source.(maybe a bit like Geonames http://www.geonames.org/ ).
Lack: Embedding facts to Wikidata; contribution to (Wikimedia?) ecosystem
But: why should this be on WS not commons?
Distinguish between Wikisource goals and ecosystem-goals
Or census reports (they don't have names, but usually just the geoname and the number of people)
⇒ pick this up when we talk about ideas/tools
Asaf: people stumble across wikisource content, much like early days of Wikipedia. No name recognition, no brand recognition. People stumble upon Wikisource, don't look for it.
Initially, was a mirror site to project gutenberg, so low down on Google pagerank
Want: learn from existing practices to generate international community of practice.
Anika: Some: where can they find digitalizations, not looking for a collection of the texts.
Liam: A better analogy might be wikibooks. Both have a steep usefulness curve: the document is not useful until it is finished. Affects public perception, because it is easier to find something not finished than finished. Probably affects editor engagement; you can't make something useful until it is ready. One section of a transcription is not useful.
Economics of size!
Lack: Differentiation with other projects, maybe we can compare Wikibooks to Wikipedia, as the issue between Wikisource and Wikimedia Commons.
Lack: satisfying contributions that make a tangible difference. Wikipedia is "easier" that way: you change a comma, and it's useful. In WS you have to proofread several pages...
Adding some capcha program within the Wikimedia projects using Wikisource documents may lower the bar for contribution
Andrea: Difference between public perception on contributors/readers.
*Facilitator*
Gaps:
target group that has a real interest
social gap: wiki commons is a work in progress, can't do the same thing with wikisource.
Solution? other social bridges than the document. Not a different logo, but building personal relationships to show the deep knowledge existing on project.
Gutenberg Project (and National equivalents: Swedish, Hebrew and Italian)
Asaf has volunteered w/ PG and runs a national equivalent
Johan suspects that the Swedish national Project Runeberg is the reason that sv.ws is pretty much dead - not that many people in Sweden interested, is a competitor for people at least (though not necessarily worse for the world)
Andrea: PG is only real competitor, because we do the same thing: transcribing books. "I'm not saying they are better than us, of course not" (laughter)" the big difference is that we are bazaar, they are cathedral" (open source analogy)
Armenia - digital libraries upload no matter the copyright
Arabic - only one book in Gutenberg. Community is in WS.
Question: Why do few people use ar-wikisource? Maybe because a lot of pages have little content and were created by a bot? Many author-pages that are not connected to Wikidata? (Tobias)
Reply: the bot created articles are mainly poetry. I m working on interwikis specially for authors?
Thanks: I think connecting the items will also show redundancy between Wikipedia and Wikisource, for example for Author pages. (Compare redundancy problem between WikiSpecies and Wikipedia)
No WS in Punjabi, only a few books in Google books, Internet Archive doesn't have any/many out of bare copyright
India - Digital Library of India. They have a lot of scanned books in different Indian languages and also other languages, should work with them There are other public digital library in different parts of India like West Bengal Public Library network etc.
Eduardo: No Spanish bookshelf in PG; not part of the ecosystem, PG doesn't matter so much. Might find things on IA or GB.
"Where does one go when one wants classical Spanish texts?" "The real library."
The National Library of Chile has digitized "public domain" as well as copyrighted texts and has uploaded them to Memoria Chilena, a website created by the National Library
Internet Archive
Andrea: friend, not competitor. Use OCR as part of work flow, purely technological (some discussion of this)
They do digital preservation very well
There is no significant volunteer community around the Internet Archive's content
Brian: We need more interaction with IA. We need smooth obvous lines of communication for volunteers.
Library Projects
What do we want for our projects that other projects already offer?
What does Gutenberg have that we want?
Asaf: Gutenberg have customised workflow, volunteer have assigned his page to proofread and so on
Asaf: Project Gutenberg has a system that gives you little bits of work and is very simple to use. People can get into a zen mode of producing many pages. They have the technology that they wanted; they have something that fits how they produce e-texts. We have mediawiki, which is optimal for Wikipedia, but not for what we do. The reason Project Gutenberg dont have books in Arabic or Indic languages, because they didnt have good quality OCR, even few years ago, some of them lacked Unicode support (2010!!!).
Andrea: customized software eases the flow, you can divide it in small steps and you can have a sort of gamification. WS should have software designed for easing workflow; not for such an industrial way, let him do multiple things--proofread, discuss, add links, many things.
Gamification: Context is important. Single sentences are hard to put into context, or to find typographical errors.
--> a Wikicaptcha could be good for a 25% or something: we would still have other workflows in place... Andrea: But keep in mind that Internet Archive has very good OCR (form any languages) and they continue improving, a lot. Anyway, asking support for "postprocessing" tools could be a very good idea.
Google: has large corpus of books. If one book is unreadable, no big deal. They work on OCR/reCaptcha and improve it/algorithms.
Some people say: wikisource is for quality, you are flooding it with OCRed bad stuff!
Different project cultures
Question: How is OCR for non-latin scripts?
Brian: Terrible. I have yet to seek OCR work for Greek.
Suggestion: do identity discussion, *then* start in on specific operational details.
Statement: "Wikisourcers curate open texts."
Proofreading?
Curating? Annotate texts with wikidata of quoted authors?
Publishers of new editions of old books? ( the case on he.ws )
he.ws: New commentary on book on separate page. Community effort to provide modern value, one step further than editors notes.
Translating historical words to modern so that non-historians can read the text. We should comment as much as we need to remain understandable.
Original translations are made as well, especially for new laws, proclamations, government announcements.
Strengths and weaknesses:
Google Books : in a nutshell : quantity and not quality
+ money
+ massive scan
- bad metadata
- no community
- no improvement
- commercial
- non-free (not known to be free)
- impermanent (books disappear, this is why TPB bot uploads google books to IA)
- volunteer empowerment (find a niche and do it ; like it or leave it)
- lack of language diversity
Internet archive
+ nonprofit
+ massive scans, massive OCR
~varying quality by language
- missing many copies of key texts; many texts contain badly scanned pages
+ free (yes) and durable (?) (i.e. Wayback Machine)
It needs huge amount of money to sustain the project
But: Brewster Kahle (founder) is the only person to have successfully contested a secret NSA letter (with EFF legal support; sued the US gov't, resisted letter, never gave info, and can talk about it!)
? may be they face some copyright legal problem (US copyright trolls?)
- no community and failure to build one (but unlike google books they have open platform, https://openlibrary.org/ , can edit metadata but pretty much no one does)
+ tech quality
+ API
- hard to find texts (iffy metadata, though editable)
+ friendly! San Francisco!
ok on accessibility
pretty good for mobile
Library projects - GLAM projects
+ access to print materials - they have the collection
+/- most have no community
except NLA, NARA, (they like us!) Trove [volunteers encouraged to tag, "folksonomy"?] http://trove.nla.gov.au/
+ audience (users / readers, if not volunteers but a high potential), reach
+/- funding
massive projects, massive funding, budget ends, :((( . Software rots. May not have money for maintenance. No volunteers, no ability to do work regardless
- many don't want to release control over content
e.g. partnerships with corporations, etc.
British Library and MS silverlight as a way to *get it done*
- copyfraud (in France, all libraries do more or less copyfraud - even the BnF do copyfraud ; only one exception : http://tablettes-rennaises.fr/ where Vigneron worked :P )
+ generally have accessibility because of government funding
Wikisource ('+': 13, '-': 4)
+ community
improve volunteers
conferences with free coffee
+ volunteer empowerment, do what sounds fun!
- inflexible platform - have to contort around platform to work
+ flexible platform : wiki is flexible. Out-there solutions are proof of flexibility
space for social innovation
+ topical curation
+ integration: wp, commons, wd (wiki-ecosystem)
- access
+ annotations (possibility)
+ hypertext: we can link books to books, authors to authors, make an internet of books
+/- searchable/discoverable
good through google, bad through wiki!
+ wikipedia zero
+ nice URLs (human-readable URLs)
+ language diversity <3
+ interwiki
+ mediawiki has some underlying accessibility (~70% (of what?)), though user styling may not be friendly
these are offering scanned books, but violating copyright - this is an important fact in the environment in which ukwikisource is evolving
Opportunity:
Zvi: Simplify editing process and customisation
Mohsen: Right to Left Proofreading support
India: OCR improvements and integration into Wikisource
Wikisource as OCR training corpus for Indic languages
Offer a international book library (for end-users, very user friendly, mobile accessible)
"Netflix" for books (See: https://inventaire.io/welcome which is built using Wikidata. We just have to connect the Wikisource-text with the interface)
Global repository: open access journals, not just books. We could provide a gateway to make them freely available over WP Zero, and easily citable through Wikipedia (click-and-go)
Translation of Wikisource texts
Translation project under Wikisource or can be under a new project?
Different questions: handle existing translations (e.g. a Spanish and Italian one), vs. create translations like Content Translation (Wikipedia)
Partial Transcription
Develop or Improve Upload tool for different Digital Libraries of different parts of the world
Search for texts that interest you; import search engine to find texts that interest potential contributors (not having to go through ~3 websites to find something and import it)
Natively digital files
With open access articles on Wikisource one could link to the exact paragraph and thus offering the best way to reference sources in Wikipedia
Diversity of experiences across languages; how can the individual language projects learn and share issues and evolve broader strategy
Discoverability - article-level metadata, topical curation, "aboutness" statements. See http://aboutness.org
>> facilitation note: discuss opportunities, don't drill down into problems or implementation details <<
Opportunities:
*Technology
Simplicity/Customization
RTL/LTR support
OCR
Global repository
Translation TMB
Analytics tools, like being able to track a word and set of word frequency and distribution over time within the wikisource corpus and subcorpus
Schoolbook
Also bilingual dictionaries: Working together with Wiktionary
Proofread innovation
Translation (TBD)
Upload tools
Import search engine
Good navigation system
Open access publishing
Digital native publication
Open reading app (improved reader experience)
Discover sources you couldn't find otherwise (improved navigation)
Topical curation
???, original compa??
Tagging, types
*Community
Social solutions:
Onboarding - ambassador core
Teahouse
Mentorship
Documentation
Workflows
In person collaborations
Contests, gamification and fun can motivate people, some promising results have shown up
Welcome motivation tools/package
Andrea: The present state of technology for Wikisource is not good, so something has to be done about it.
Concepts:
Design, needed for effective/usable engineering
Incentives
Contests--motivation (competition? working with/at the same time? social capital/barnstars? money?)
Learn typing by contributing to Wikisource
*Quality / Quantity
Andrea : better navigation, better simplicity, help users be part of community. Community as prerequisite/enabler of quality and quantity.
What is Wikisource? A library, or a place to transcribe text sources?
Enjoy crafting books, but don't care if book is useful?
We need more collaboration between projects & communities (Exchange work packages, create automatic maintainance lists)
Educate more people (Close the educational gap)
Change copyright law :-)
Make digitization of world literature easier, to appeal to libraries: interoperability with GLAMS, which could use Internet Archive and Wikisource for digitization. They though need to apply very hi
Standardize Metadata
More accessible, particularly on mobile; a Wikisource application for browsing/downloading ebooks
Sharing knowledge between communities, learning from each other. Lack of coordination.
Focus on Translation
Make books accessible that are written in older scripts
Digitalization is not focused on smaller language communties (No commercial interest in e.g. Old-Persian script books). WMF should advertise Wikisource, support it.
Need for more enthusiasm to welcome more people
Make template translations and reuse easier and cut down on maintaince. Small communties suffer the most from maintenance work, template coding.
« A worldwide library, similar to oasis, to expand humankind’s mind by art and knowledge »
« Make Wikisource the sum of all primary source material freely available »
« Open up, share and make available great collections of libraries »
« Provide a digital library and archive, integrated multilingual and reusable »
« A crucial part of the Wikimedia ecosystem − the primary source place fully integrated with the others »
« The Library of Babel » − « A free infinite library curated by volunteers »
« WS = Stuff that other people wrote » − (Similar to Wikiquote, or quotes on Wiktionary) ; WS = works != content (like Commons) → « Community curated works »
Easier integration of Wikisource onto Wikipedia. Giving it more exposure.
« A digital library of the Wikimedia movement built by volunteers to be used by everybody »
« The sum of all literacy works easily available to everyone » (Present literature in a modern way)
Should be part of the Wikimedia ecosystem as something that can be used
« Bridge between the projects of the WMF »
« WS also has its tools as part of mission - Provide open and easily usable tools to digitize literature »
« Curation of open texts to further the aims of the Wikimedia movement »
« Free repository that people can use ; and to preserve local languages »
Summary Roundtable
Curation
WS integrated as part of ecosystem
WS as tool for Wikipedia and Wiktionary
Library aspect (Librarians) for reading or research
Eductional aspect
Coordination across languages
Works, Transcribe
Language
Diverse international Community working on diverse languages
Language preservation
=== Xmas shopping list ! ===
« I need X because Y »
« I need connections to Internet Archive, because of damaged or missing scans; I can provide the books » // What is meant here by « connections » ? personal contacts
« I need a stable Index namespace because updates seem to break them often, halting progress »
Simpler way to bootstrap Wikisource on other languages
bootstrap = setting up, installing, making ready for usage
Standarizing the templates, gadgets and infrastructure
« We need a TPT cloneS so we can get stuff done. »
Improve documentation for technical aspects
Improve search tools
Exporting complex languages (PDF, Epub, etc.)
Standardize character insertion tools across Wikisources, with the ability / information to note missing characters from the set
« I need templates that render consistently under all export/layout tools » (Adobe lingo: Liquid layout)
« I need a common CSS to export Wikisource texts »
General layout should work on different size devices and printing (page breaks, indentation, paragraphs)
Setting this up should be easy and reusable
GLAM
I need analytics/statistics about how many book, how many pages, how many downloads..., so I can show the partners that they benefited from the collaboration.
Also needed for credibility: Why should GLAMS provide rooms, access to material if they don't know the metrics.
I need easy import and easy export tools. Need to be usable by GLAMS, as feedback, to return a final product
Attribution, Diffs, Showing work done by WS
I need Metadata in order to work with partnering GLAMS
« Communicate with libraries in their language »
Example: VIAF already has Wikidata-IDs for authors (belett : isn't it the other way round : Wikidata had a VIAF property for authors ? Tpt: both ways are true). The same needs to be done for works and with different GLAMS.
The Library of Congress has data for the works it holds. These can be added. But it is a slow and tedious process to find them.
We need clear and consistent standards on Wikidata to make this possible: standards on data items for translations, editions, series, and works
I need more help from the regional chapters to make a more professional appearance
Example: Libraries want an official partner that can sign documents.
I need more meetups with GLAMS to better communicate WS's goals and get them on board for collaboration.
« Make connections, share understanding, work towards agreements »
I need OPDS so E-reader devices can connect to our content (so people can read our content and can discover it) -> wsexport has already a basic OPDS file for fr.wikisource
I need a low-bandwith or offiline-first version of WS in order to distribute content to people with sporadic or low-bandwith internet connections.
Example: CDs, Preinstalled on Cellphone SD-cards -> Kiwix people semms to plan to do with Wikisource what they have done with Project Gutemberg
Navigation
A catalog, an easy way to access the contents, and easier way to find them.
English Wikisource uses Portals arranged according to the LoC cataloging system.
I need better design for better navigation and reduce maintainance
Example: Automatic list of books, for each author.
Example2: Easier to navigate for contributors and people searching for books.
" Understand where am I, and where I am going to as a user "
I need a dictionary extension to translate words dynamically
Example: A popup on double clicking the word.
It is possible to use <tooltip> to do this now, but not dynamically.
Perhaps use a Wiktionary searching tool?
I need better integration for multi-lingual books
Example: Schoolbooks that have original text on left page and translation on right page.
Classical texts do this as well. It is a huge problem to add these.
Translating dictionaries are even worse.
We need help setting up interwiki links for translations and editions through Wikidata.
Right now, each different edition is a different data item, so there will be no interwiki links between copies of the same work in different languages.
i.e. The Italian translation of "Oedipus Rex" has a different data item from the French translation, and the English translation, and any other translation.
And none of these is the same as the "main" data item for the original Greek work. Each is a separate edition with its own data and data item.
We will need a special tool to generate interwiki links on Wikisource for all the different editions on each language project.