Page 1 of 1

Any Phonetic and Transliteration Expert?

Posted: Sat Mar 15, 2008 6:53 am
by tejasvee
Question to Linguists, phonetic script and Transliteration experts:

The more I mingle with folks from various parts of India, the more I get confused.

Go to Karnataka: 'h' is added to t for most t-ish sounds. Example Geetha, Seetha wherein you would normally expect Seeta/Geeta in other places.

Go to AP: s is used for 'sh' sound in other places. Example Sarat, Visakha.

Go to TN: k and g are using the same character (?). New za kind of letters show up. Example nayakan and nayagan are from the same original word. Tamizh is tough to write in other scripts (no west Asia linked ones)

Go to Hindi states: Most 'ee' sounds are elongated. For example Lakshmi gets a 'deergha' unlike southern 'hrasva'. Same for hari or pati. Also, using this tool
http://www.google.com/transliterate/indic/Hindi , buddha doesn't transliterate to someone's name, but to an 'old man' meaning, unlike south.

Go to Assam, OR, WB: Most 'v's become 'b's. Example Bijay, Bibek.

Go to Mah: Use of 'chh' is there unlike south. Example chhatrapati.

Is there a standard way to write Indian words so that people from all states can read the same way? I am particularly interested in learning a way that doesn't bring in ASCII characters or upper/lower kind of mixed cases.

Any Phonetic and Transliteration Expert?

Posted: Sat Mar 15, 2008 8:54 am
by dhanu
The problem that you have described is a standard language to language transmission loss problem.
Basically, there are two issues -

1. some sounds of language 1 are not available in language 2. So the script for language 2 does not have any way to encode/transcribe those sounds in letters. So, while geeta written in devanagari is unambiguously geeta but when you transcribe the sound of geeta ( ta, actually), there is no way to do that. So people use approximate encoding...geeta or geetha.

2. Most scripts are a lossful way of transformation from sound to letters. For example, tamil script has only one letter for t, th, da, dha, only one letter for ka, kha, ga, gha, only one letter for pa, pha, ba, bha etc. These are highly context sensitive scripts. So there is no way a sound that is spoken in tamil language can be encoded completely unambiguously in tamil script. The same issue exists with English.
In this respect, devanagari script provides a complete lossless encoding of the sounds spoken in Sanskrit (and Hindi, marathi, etc.).

While it is understandable that encoding sounds of one language into script of another language may be lossful (since the script was not meant to encode sounds of other language) but lossful encoding of the sounds of one language into the script for the same language is the deficiency of the script/language combination. Thus, one can argue that Tamil or English can be better written and communicated (without any loss of information about phonetics) in Devanagari.


There is another dimension of phonetics that probably no script captures and the data in dimension is usually lost in encoding. This is the tone/modulation/emphasis dimension. Simple example is a question mark or full stop at the end. If there is a question mark, you speak the sentence with a rising tone. If there is a full stop, you speak with a flat or sliding tone. This is a limited encoding because scripts do not provides markers for emphasis. For example: How can you do this? How can you do this?
The effect of this loss is very pronounced in Mandarin because in Mandarin a word can be pronounced in 4-5 tone styles and each style can have completely different (unrelated to others) meanings. This information is not captured in the script and that makes it even more context sensitive than say English or Tamil.


[quote]
Go to Assam, OR, WB: Most 'v's become 'b's. Example Bijay, Bibek.
[/quote]
This is not a language/script issue. It is just another language dialect. The word for Vijay in Hindi is Bijoy in Bangali. So here Vijay is not being written as Bijoy. Bijoy is being written as Bijoy. Vijay doesn't exist (conceptually).

[quote]
Go to Mah: Use of 'chh' is there unlike south. Example chhatrapati.
[/quote]
Again, another dialect, additional sound. The sound is fully captured in the script of marathi (which is devanagari). If you try to write that in Tamil and show it some other Tamilian who does not know this word, he can't pronounce it (because 1. that sound itself is not there in tamil, and 2. that sound is not unambiguously encoded in tamil script.) But if you write it in devanagari and show it to a Hindi speaker who also doesn't know the word, he will be able to pronounce it exactly as intended by the writer.

Any Phonetic and Transliteration Expert?

Posted: Sat Mar 15, 2008 8:59 am
by dhanu
tejasvee;88839
Is there a standard way to write Indian words so that people from all states can read the same way? I am particularly interested in learning a way that doesn't bring in ASCII characters or upper/lower kind of mixed cases.[/quote]

You can develop a universal script that captures all the information contained in speech of any language :) Something like this: http://en.wikipedia.org/wiki/International_Phonetic_Alphabet

I find Devanagari to be the best among the scripts of India and some other foreign scripts I have studied in a limited way. But then of course, I might not know some script that is even better.

Any Phonetic and Transliteration Expert?

Posted: Sat Mar 15, 2008 8:35 pm
by tejasvee
hanu;88847Basically, there are two issues -

1. some sounds of language 1 are not available in language 2. So the script for language 2 does not have any way to encode/transcribe those sounds in letters. So, while geeta written in devanagari is unambiguously geeta but when you transcribe the sound of geeta ( ta, actually), there is no way to do that. So people use approximate encoding...geeta or geetha. [/quote]

When I took Geetha and Geeta example, I had mostly Kannada & Hindi comparison in mind. Both scripts (Kannada/Telugu & Devanagari) are very similar in sounds. In fact Kannada & Telugu were the same script (old Kannada) a few hundred years back, just like Bengali & Assamese today.

52 base alphabets in Kannada: 14 vowels + 2 (anusvara and visarga for a) + 25 consonants + 7 sonorants/fricatives + 4 others (ha, La, Ksha and Jna)
http://www.omniglot.com/writing/kannada.htm

48 base alphabets in Devanagari: 12 vowels (didn't include Lru and Lrru) + 2 (anusvara and visarga for a, excluding chandrabindu & virama) + 25 consonants + 7 sonorants/fricatives + 2 others (ha and La->only in Marathi)
http://www.omniglot.com/writing/devanagari.htm
I am sure most sister scripts of Devanagari like Gujarati, Punjabi will be similar.

So as you can see, there is not much difference at all. I know that Malayalam has 3 more sonorant/others, making it a 55 base alphabet script. That is by far the biggest alphabet list I know from any Indian language.
http://www.omniglot.com/writing/malayalam.htm

I am not familiar with Tamil script, except a couple of alphabets. It looks like Tamil has the smallest alphabet list (35?) among Indian scripts.
http://www.omniglot.com/writing/tamil.htm

So except Tamil, most other langauges I have listed have similar sounds & alphabets.

That's one of the big reasons I don't understand why the following letters in particularly are written differently in English in those states.
ta, tha (retroflexes ट and ठ), ta, tha (dentals त and थ), sha and sha (sibilants श and ष).

Examples - When using Kannada google transliterate tool, try 'Maraati'/'Marathi' and it will get you that exact language's name. In Devanagari, you need to use 'Marathi' to get the same. Using 'Maraati' gives you a different word. When using Telugu google transliterate tool, try 'sankar' to get Shankar in the local script, whereas in Kannada/Hindi, you need to type in 'Shankar'.

Why is the situation like this?

hanu;88847 This is a limited encoding because scripts do not provides markers for emphasis. For example: How can you do this? How can you do this? [/quote]
Agreed

hanu;88847This is not a language/script issue. It is just another language dialect. The word for Vijay in Hindi is Bijoy in Bangali. So here Vijay is not being written as Bijoy. Bijoy is being written as Bijoy. Vijay doesn't exist (conceptually).[/quote]
Good observation. Noted.

Any Phonetic and Transliteration Expert?

Posted: Sat Mar 15, 2008 10:29 pm
by Chicago Desi
T,

The changes that you are talking about are due to regionalization of these languages. It also reflects the different influences on these languages over a long period of time. Assuming sankrit to be the mother of these prakrit languages, each underwent its own unique metamorphoses over long periods of time. Add to that the different influences (greek, persian, turkish, british, dutch, portuguese) and what you see is a hodge podge which makes each language unique in its own way. There is a tremendous influence of farsi on languages of the deccan between 14th and 18th century as well.

My feeling is that the tradition of oral tranmission of shlokas and texts was to mitigate some of these losses over periods of time. Sanskrit plays huge emphasis on pronunciation and emphasis on certain syllables and consonants for this very reason. Sanskrit scholars take great pride in pronunciation and emphasis.

A lot of the evolution (not influence, which came from outside) of each language can be traced back to efforts of a handful of people. For marathi, the influence can be traced back to two people: a) Sant Dnyaneshwar (who translated Bhagwad Gita in 13th century) and is credited with influencing what is now called old marathi and b) Vinayak Damodar Savarkar who is credited for influencing new marathi tremendously. This is not to say others did not play a part, but some folks influenced languages more than others, and it can be traced back to their works.

Any Phonetic and Transliteration Expert?

Posted: Sun Mar 16, 2008 8:24 am
by dhanu
tejasvee;88897
That's one of the big reasons I don't understand why the following letters in particularly are written differently in English in those states.
ta, tha (retroflexes ट and ठ), ta, tha (dentals त and थ), sha and sha (sibilants श and ष).

Examples - When using Kannada google transliterate tool, try 'Maraati'/'Marathi' and it will get you that exact language's name. In Devanagari, you need to use 'Marathi' to get the same. Using 'Maraati' gives you a different word. When using Telugu google transliterate tool, try 'sankar' to get Shankar in the local script, whereas in Kannada/Hindi, you need to type in 'Shankar'.

Why is the situation like this?
[/quote]

Because English doesn't have precise alphabets for each of त serise and ट series sounds. If you write ta for geeta,how will you differentiate between ta of geeta and ta of Aata (flour)? If you write tha for geeta, how will you differentiate between tha of geetha and tha of pratha (custom)? So no matter how you transacribe Geeta in English, that ambiguity will always be there. Some regions (south) in India have picked tha and some (north) have picked ta for geeta.

Same with Marathi, Maraati, Marathhi. None is wrong. But because of convention, Marathi is the one commonly used. If tha is for ta (as is Geetha), then Marathi will look like a wrong transcription, but it is not.

There is another issue not discussed here. Why do we write Yoga or Rama or Mahabharata for Yog or Ram or Mahabharat? IMV, this is because the wrong information passed on to the europeans due to wrongly written Hindi. When you say Ram, the consonent M is not fully spoken (unlike in Rama). i.e. r and m consonents are spoken differently in the word Ram. However, in Hindi, it is commonly (and wrongly) written as राम which does not capture the halfness of म. It should be written as राम with a halant on म. The lack of halant means the presence of the vowel a attached to म, which is why early europeans read it wrongly as राम (with full m) and translated it to Rama. Same with Yoga and Mahabharat.

Any Phonetic and Transliteration Expert?

Posted: Sun Mar 16, 2008 8:37 am
by dhanu
Chicago Desi;88918
My feeling is that the tradition of oral tranmission of shlokas and texts was to mitigate some of these losses over periods of time. Sanskrit plays huge emphasis on pronunciation and emphasis on certain syllables and consonants for this very reason. Sanskrit scholars take great pride in pronunciation and emphasis.
[/quote]

But that pronunciation information is present in the script. A shloka written 1000s of yrs ago in sanskirt will be recited today in exactly same manner if read by a person who hasn't heard it before.

The reason for so much emphasis on pronunciation is because sanskrit has traditionally been written in a very concise form with use of sandhi-samas. It is very common to see one long word that is actually a connection of up to 5 words with the use of sandhi. For this reason, it is difficult for novices to separate out the different words from one word. For example, lets see this word from Atharwashirsha (a set of shlokas commonly recited by Marathis): Raktagandhaaunliptaangam. This is actully : rakt gandha anulipt anagam.
Another jumbo one: Prasyandnmgandhalubdmadhupaayaalolagandsthalam Try separating this out :)

For a novice, it is not possible to read this correctly even though all the pronunciation information is contained right there in the word itself. For this reason, experts used to make sure that the pupils said it right by having them recite it over and over again.

Any Phonetic and Transliteration Expert?

Posted: Sun Mar 16, 2008 12:18 pm
by layman
I think each language has their own way of pronouncing a word. For example, a Malayali pronounces the word "college" with "co" as in "cold" while a tamilian pronounce that with "co" as in "car". If they had to write the word in their language they may use alphabets with corresponding phonetics.

This reminds me of an incident. There was a French tennis player called Guy Forget. I had a French boss and I mentioned this player to him. I told his name once, twice...5 times, he didn't get it. Finally, I wrote it down for him. He said, "Aah Fochet..." Even though alphabets are same, the pronounciation could be different across languages. This is a case of writing the same alphabet but pronouncing differently. Don't know whether it exists among non-tamil Indian languages. Between non tamil language and tamil there are definitely chances of ambiguity because as Hanu already mentioned Tamil provides only one ka, cha, da, tha etc

Any Phonetic and Transliteration Expert?

Posted: Sun Mar 16, 2008 5:09 pm
by dhanu
layman;89003I think each language has their own way of pronouncing a word. For example, a Malayali pronounces the word "college" with "co" as in "cold" while a tamilian pronounce that with "co" as in "car". If they had to write the word in their language they may use alphabets with corresponding phonetics.

This reminds me of an incident. There was a French tennis player called Guy Forget. I had a French boss and I mentioned this player to him. I told his name once, twice...5 times, he didn't get it. Finally, I wrote it down for him. He said, "Aah Fochet..." Even though alphabets are same, the pronounciation could be different across languages. This is a case of writing the same alphabet but pronouncing differently. Don't know whether it exists among non-tamil Indian languages. Between non tamil language and tamil there are definitely chances of ambiguity because as Hanu already mentioned Tamil provides only one ka, cha, da, tha etc[/quote]

This happens with English as well. E.g. silent r, p, h etc. The thing that you mentioned with "college" happens because there is incomplete phonetics information in the script. So different people pronounce it differently.

This can't happen in languages of Devanagari script. I don't know about others.

Any Phonetic and Transliteration Expert?

Posted: Sun Mar 16, 2008 7:19 pm
by tejasvee
hanu;88982For example, lets see this word from Atharwashirsha (a set of shlokas commonly recited by Marathis): Raktagandhaaunliptaangam. This is actully : rakt gandha anulipt anagam.
Another jumbo one: Prasyandnmgandhalubdmadhupaayaalolagandsthalam Try separating this out :)

For a novice, it is not possible to read this correctly even though all the pronunciation information is contained right there in the word itself. For this reason, experts used to make sure that the pupils said it right by having them recite it over and over again.[/quote]

This is the technique they used to keep the sounds perfect for millenia.
http://www.r2iclubforums.com/clubvb/showpost.php?p=86652&postcount=46

hanu;88979Because English doesn't have precise alphabets for each of त serise and ट series sounds. If you write ta for geeta,how will you differentiate between ta of geeta and ta of Aata (flour)? If you write tha for geeta, how will you differentiate between tha of geetha and tha of pratha (custom)? So no matter how you transacribe Geeta in English, that ambiguity will always be there. Some regions (south) in India have picked tha and some (north) have picked ta for geeta. [/quote]
Precisely the reason I asked the question in OP. Has anyone tried to create one standard that North, south, central, west and East can use as most languages have similar character set (except Tamil)? That Sankar/Shankar still confuses me.

hanu;88979There is another issue not discussed here. Why do we write Yoga or Rama or Mahabharata for Yog or Ram or Mahabharat? [/quote]

That's because of tracing back to original Sanskrit words and not tracing to Hindi or other current Indian laugange words.
http://en.wikipedia.org/wiki/List_of_English_words_of_Sanskrit_origin

The three words you brought up when written in Devanagari as used in Sanskrit:

रामः (Ramah with a visarga) - If you learnt Sanskrit grammer/vyakarana, you might remember Ramah - Ramou - Ramaah kind of 27 formations of the word Ramah.

योगः or योगस (?) (Yogah or Yoga-s) -
http://dictionary.reference.com/search?q=yoga

महाभारतं (Mahabharatam)

Sanskrit itself is the anglicized version of संस्कृतं = Samskritam/Samskrutam; Samyak Kritam Samskritam or refined langauge.

Hindi has influence from West Asian languages resulting in the diminishing/vanishing effect of words ending with 'a' or 'ah', particularly male names. So Bheemah & Ramah becomes Bheem & Ram. South Indian languages and to some extent Marathi have retained the closeness to Sanskrit names still.

PS: List of thousands of Sanskrit words, wirh pretty good pronunciation details:
http://sanskritdocuments.org/dict/dictall.html