ব্যবহারকারী:Alex brollo/লিপ্যন্তর

উইকিসংকলন থেকে

Ok, let's go on with this fuzzy project! Here I'll collect links and ideas about bengali transliteration, a hard challenge since (just as most Italian fellows) I don't know almost anything about bengalese or any other indic script. Is it possible to get anyway some result? I don't know.... I'll simply follow the suggestion সাহসী হও (be bold: I hope that Google translator did a decent work!)

Need for Bengali Wikisource[সম্পাদনা]

  • Devanagari to Bengali (for Sanskrit texts in Devanagari script)
  • Bengali to Devanagari (for Sanskrit texts in Bengali script)
  • Roman to Bengali (for Márk Likhita Susamácár)
  • Bengali to Roman N
  • Devangari to Roman N
  • Roman to Devanagari N


Current romanization standards[সম্পাদনা]

  • I.A.S.T: (en.wiki): unicode based, lossless
  • ISO 15919: (en.wiki): unicode based, lossless
  • Harvard-Kyoto: (en.wiki): ASCII based, used by sahityam.net
  • National Library at Kolkata romanisation (en.wiki): unicode based, very similar to ISO 15919
  • Indian languages TRANSliteration: (en.wiki): ASCII based

Romanization of Bengali[সম্পাদনা]

Bengali Academic Roman Popular Roman
অ, আ, অ্যা a, ā, - a
ই, ঈ i, ī i
উ, ঊ u, ū u
ri
e e
ai ai
o o
au au, ou
ক্, ক, কি, কী, কু, কূ, কৃ, কে, কৈ, কো, কৌ k, ka, kā, ki, kī, ku, kū, kṛ, ke, kai, ko, kau Without diacritics, কৃ = kri
ক্ক, ক্ত, ক্ত্র, ক্ন, ক্ব, ক্ম, ক্য, ক্র, ক্ল, ক্ষ, ক্ষ্ম kka, kta, ktra, kna, kwa, kma, kya, kra, kla, kṣa, kṣma Without diacritics. ṣ = sh

Things to explore (sahityam.net inspired)[সম্পাদনা]

Wander 1[সম্পাদনা]

Imagine a template like this (using Deganavari script, imagine that it would be bengali): {{trans|पर्वत वर्धनि पाहि मां|parvata vardhani pAhi mAM}}

Parameter 2 is Harvard-Kyoto code for parameter 1. Using parameter 2 sahityam.net scripts can output the coded text into many indic script. But I guess that it's almost impossible that a usual user can comfortably edit the HK code into parameter 2; nevertheless the first transliteration indic->HK could be done by a javascript edit tool, so that the user could edit parameter 1 and the script could rebuild parameter 2.

  • user would write {{trans|पर्वत वर्धनि पाहि मां}}, then clicking a button all {{trans|[indic]}} template parameters would be converted back into {{trans|[indic]|[Harvard-Kyoto]}} , t.i. {{trans|पर्वत वर्धनि पाहि मां|parvata vardhani pAhi mAM}}.

After saving the result, sahityam javascripts would transliterate back HK codes into one or the other indic scripts working on served html.

  • drawback: result can't be exported since it is locally built at client side.
  • Step 1: to build the first step conversion tool bengali->Harvard-Kyoto
  • Step 2: to customize sahityam.net scripts to the html output of {{trans}}

Wander 2[সম্পাদনা]

Same as first idea in edit step, but {{trans}} would invoke a Lua script passing parameter 2; Lua script would generate any number of transcripts, all with a display:none css but one. Toggling into visibility one or the other block the result would be identical to previous one, but the different scripts would be server-generated and exportable (visibility of one or the other could be stated editing the css).

Just something to test[সম্পাদনা]

@Bodhisattwa: Please run importScript("User:Alex brollo/t.js"); into your javascript console; then you (I hope) can run a mlb(text) function, where text is a piece of Mark-Likhita code; you should get something like a Bengali back transliteration of romanized Mark-Likhita text.

I imagine that there are lots of mistakes... just a first step.

Please can you manually transliterate some periods of Mark-Likhita, just to compare the right transliteration with the result of the script? Alex brollo (আলাপ) ২১:০৩, ২৯ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo:, I dont know javascript at all, but looks like I need to start learning. Thats why I love wiki, everyday an inspiration to learn something new. By the way, forgive my ignorance, is it the right code to get function mlb, and where do I get the output text? Really sorry, I am wasting your time. -- বোধিসত্ত্ব (আলাপ) ০৫:৪৯, ৩০ অক্টোবর ২০১৬ (ইউটিসি)
@Bodhisattwa: Ouch! I apologyze, if you don't know javascript my suggestion is unuseful. Really javascript is not so difficult (if you know any other programming language), the main issue to start is to understand how to run it... it's very strange to find that javascript is simply inside the browser, and that it is usually run by the browser. I guess that as soon as you'll start, you'll fall in love with it.
Presently the script is not linked at all to {{tr}}, it's only its temptative "basic engine"; since it seems more or less to run, next step will be to link it to the template and to link it to a toolbox button, then it will be simple to run by a click. Alex brollo (আলাপ) ০৬:৩৫, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

Something runs[সম্পাদনা]

@Bodhisattwa:

  1. By now you can run a test application importing User:Alex brollo/t.js by a statement importScript("User:Alex brollo/t.js");
  2. A icon will appear into toolbox. A click on this icon will run the script.
  3. The script, in edit mode, edits any one-parameter-only tr template; assuming that parameter 1 is Mark-Likhita code, it adds a parameter 2 bengali transliteration of it. Bengali text can be freely edited and it will not been changed by the script.
  4. The script adds too one event for displayed html in view mode, so that clicking it romanized and bengali text toggle into alternate visibility.
  5. This is the first version only and it needs lots of fixes. One known issue are double vowels, ai, au, ri, they presently are simply replaced by "placeholders" ä, ü, ṙ into Bengali text; transliteration algorithm will be completed as soon as possible.
  6. Anything can be changed! The whole project by now is only a it can be done test. Alex brollo (আলাপ) ১৬:১৪, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo:, Wow, this is just amazing!!! The first step is always the hardest one and as we have already passed the phase, I am pretty sure that it can be done. The toggle is working fine, although IMHO, there can be a toggling option in the sidebar for the readers who are the targets for this transliteration work. I am giving the comparison below few of the transliterated text happening now and what it should show, so that it can be fixed.

English text Current transliteration Correct version Fixing Needed Rule
MÁRK মার্‌ক্‌ মার্ক র্‌ + ক = র্ক 1) consonant conjoining 2) 2nd point below
LIKHITA লিখ্‌ইত লিখিত খ্‌ + ই = খি (as ল্‌ + ই = লি) 1) consonant conjoining
SUSAMÁCÁR সুসমাচার্‌ সুসমাচার 2) 2nd point below
Pratham প্‌রথ্‌অম্‌ প্রথম 1) প্‌ + র = প্র (2) থ্‌ + অ = থ 1) consonant conjoining 2) 2nd point below
Adhyáy অধ্‌যায়্‌ অধ্যায় 1) ধ্‌ + য় = ধ্য 1) consonant conjoining 2) 2nd point below

I noticed few general issues which, if fixed, can give a much better output.

1) Unlike Roman script, Bengali consonants when join with each other, does not remain as separate letters, they become a conjunct. The issue is that consonants are not conjoining with each other in the current version. For example, MÁRK should not be like মার্‌ক্‌ (র্‌ and ক্‌ as separate), it should be মার্ক্‌ (র্‌ and ক্‌ conjoined)

2) According to modern writing style, if a word ends with a consonant, there is a silent অ (a) after the consonant, which is not pronounced, but written. For example, the word MÁRK ends with a consonant K (ক্‌), but the letter should be ক (i.e. ক্‌ + অ = ক). -- বোধিসত্ত্ব (আলাপ) ১৯:০৫, ৩০ অক্টোবর ২০১৬ (ইউটিসি)

@Alex brollo: To solve the conjoining problem, get rid of the zero-width non-joiners (U+200C) at the end of each Bengali consonant. (Note how they're not present at the end of each Devanagari consonant.) মাহির২৫৬ (আলাপ) ১৯:২৩, ৩০ অক্টোবর ২০১৬ (ইউটিসি)
I realize now the problem of conjunct but I admit that I'm deeply confused, be patient! I considered the need of manually fix transliterationg mistakes of the algorithm - t.i. why the script works only when there is only parameter 1 and doesn't touch parameter 2 is existing - but I realize that the number of errors should be minimized. There is too an issue of ambiguity of some roman characters (j/y vs. y, n/ñ and n/ṅ vs. n) that undermines a full reversibility of Mark-Likhita into Bengali. In brief, I find it very difficult... :-( ... but I like challenges. :-)
OK for moving toggle command into sidebar, the trick linking toggling to a click was simply a shortcut to see if the whole thing works. Alex brollo (আলাপ) ০৭:১১, ৩১ অক্টোবর ২০১৬ (ইউটিসি)
About the ambiguity issue: j in Roman to be written as জ normally; it is written as য when the original is y; e.g. the name of Jesus: the original Hebrew word was Yēšū́aʿ. As for n: it is to be written as ṅ (ঙ) when followed by a Guttural letter (k, kh, g, gh); as ñ (ঞ) when followed by a Palatal (velar) letter (ch, chh, j, jh; in Mark Likhita, ch is written as c and chh as ch: this is the correct academic style); as ṇ (ণ) when followed by a Retroflex letter (ṭ, ṭh, ḍ, ḍh). Hope this clarifies. Hrishikes (আলাপ) ০৭:৪৩, ৩১ অক্টোবর ২০১৬ (ইউটিসি)
Thinking about a different transliteration algorithm ... Alex brollo (আলাপ) ২১:২৭, ৩১ অক্টোবর ২০১৬ (ইউটিসি)
Eureka! I perhaps vaguely understood that zero width joiner and zero width non-joiner. There's nothing similar into most European scripts (I can't use German), so it has been a little painful to catch it! Alex brollo (আলাপ) ০৬:২০, ১ নভেম্বর ২০১৬ (ইউটিসি)