Friday, November 28, 2008

Gone through some things @ second day!!

Freeway is an advanced Open Source eCommerce platform which can sell using methods only previously available in enterprise class or niche bespoke systems.

attended talk on OLPC,
It need more opensource contributors from india..

Its really nice application, though i have not used yet but on tablet like nokia 810 its great application..
one can do chat, talk, video conferencing and many thing..

Nice to see Sun- Virtualbox and VMWare's stall in front of each other

Wednesday, November 26, 2008

1st day at

first day started with registration and initial talk of Atul Chitnis telling the basics of foss, and what the means of "Talk is Cheap, show me the code"..

Also had a face to face meet with many opensource contributors meeting mostly on IRC

as we(Me and Rahul Bhalerao) had a talk on very first day in afternoon session, initially it was planned at 5pm but in final schedule it was at 2pm, so we mostly concentrate on our talk,

there was good audience in our talk, something around 100 and more, with majority of student..

and got a good response from some Tibetan students looking for there language to be recognize from OS point of view...

Also some people shows there concerns about Open Type Specs 1.6 and Its imact on present font rendering..

Got a good feedback on the way we covered all the i18n architecture in one shot..

Thursday, September 18, 2008

Samyak Devanagari is now Unicode 5.1 compatible!!!

All new characters added in Unicode 5.0, Unicode 5.1 are now in Samyak-Devanagari, now Samyak Devanagari fully support to Marathi, Hindi, Sindhi, Urdu, Maithili, Limbu and Konkani languages

Characters added are as follows U+0972 [ॲ], U+0971 [ॱ], U+097B[ॻ], U+097C[ॼ], U+097D[ॽ], U+097E[ॾ] and U+097F[ॿ]

Now no problem for writing ॲ[u0972] as previously people were typing अॅ [u0905+ u0945] but now ॲ is in Unicode as well as Samyak-Devanagari too

You can download and test samyak-fonts-1.2.1.tar.gz from
I think it will be in tomorrow fedora rawhide as well samyak-fonts-devanagari-1.2.1-1

Friday, September 05, 2008

Python tricks:

I was creating locale file for Sindhi using Arabic script, since locale file consist hex code of all characters and i got strings from linguist.
It was really very time consuming to check each character in Unicode code page and write its Unicode value, may be there some other good method but i followed this for sd_IN@Devanagari locale file since i am very familiar with Devanagari script.
but Arabic code page is very confusing, since Arabic words contains init, mid, final form of the characters, and Unicode chart contains standalone shape.
but from sometime i am working with python and tried following quick method:
In terminal
>>> w = 'شريمتي'
>>> w.decode("utf8")
did same thing for almost 30 strings
so fast and accurate it is :)
whenever you will confuse with Unicode characters, just do this thing and directly check character information using Unicode value in chart.

Wednesday, August 27, 2008

Shortcut method for testing new locale file with glibc in Fedora

I had some discussion on Indlinux list about testing newly locale file/collation with glibc but that time i didn't no the short cut method, i was building glibc with changes and then testing it, it was taking long time. :(
Had a discussion with Ulrich Drepper on this and he suggested good tips on testing locale/collation in 10 minutes, so sharing with all :)

step 1:
- make a new folder

step 2:
- copy the locale file there
ex: cp mr_IN to NEW_FOLDER/mr (note i have changed name of locale file)

step 3:
- edit mr (do changes you want example you can add collation info etc)

step 4:
- run following command
$localedef -i ./mr -f UTF-8 ./mr_IN
that's it.
It will create new folder mr_IN with all locale information

step 5:
- for testing with these new changes locally, use following command
*make sure you are in new folder created in first step

step 6:
for adding this into archive permanently use following command
need to be root
root$ localedef --add-to-archive mr_IN

Wednesday, August 20, 2008

Proposal for additional Deprecated Characters from Unicode


I saw this earlier but just remembered when i saw pango renders U+0953 and U+0954 characters as independent vowel even though they are dependant(dont know what is exact problem).

Unicode Public Review issue 122 dated 2008-08-04, has proposed these two characters as deprecated.

As a devanagari script user i have never used these in my writing, but i think these two characters are added specifically for vedic.

Thursday, July 24, 2008

Naneghat Rappelling

zoom it too see people doing rappelling

one nice view...

Naneghat Rappelling
(click on image to see it in full size)

I did Naneghat rappelling last week, it was very nice experience, Naneghat is 2800ft height form Sea Level and i did rappelling from top 2800ft to 2500ft , 300ft rappelling.
In life 1st time i was doing rappelling but organizers were very experienced and they gave good confidence initially also rappelling trail for 30ft. It boosted my confident.
Finally done it with lots of doubt in mind should i or should i not, but it was awesome experience and very few people gets such experience and i am one of them :)
This camp included moonlight trekking also, unfortunately we were late to reach at Naneghat base (expected 1am but we reached at 3am) but we got at least 1 1/2 hrs of moon light trekking.
Reason for reaching late was driver was drunk heavily and he was not able to drive bus, we stopped bus immediately and called his manager luckily he gave us new driver at 1am that's why we able to reach there, otherwise it was almost cancel.
when we reached to the top of Naneghat we got breakfast, Tea and then choice of either go for Jivdhan Trek first or Rappelling first but we were excited for rappelling and we did that first.
then one nice sleep then back to pune.
When from different view we saw other people's doing rappelling, we cant believed we done that..
Camp was form Sat 11pm to Sun 9pm included Tea, Break Fast, Lunch and evening snacks and all this is in Just for 1400rs.
some snaps:

ready for rappelling with instructor and all safety instrument

Me with Naneghat while watching rappelling of other people

dont know which script is this, but something was written there in Cave

with friends afte completing trek

Me while trekking

Naneghat view from base at 4:30am

Tuesday, June 24, 2008

Adding language support in Fedora

Lets say we want to add xy_XY language support in Fedora

Following is to do list for adding language support in Fedora from i18n point of view

1) Open Type, Unicode Based Font for xy_XY

2) Rendering Engine support for Unicode code points of xy_XY (pango, qt)

3) Some Basic IME's for xy_XY(m17n-db)

4) Locale File for xy_XY in glibc

5) Updating comps file for xy_XY

6) Adding xy_XY in GDM list

7) Updating s-c-l locale-list file for xy_XY

might be something missing but these are the major things

Sunday, June 15, 2008

Kashmiri Draft for Unicode Proposal

link for draft

many many thanks for this draft to
Mr. M.K. Raina
Mr. Raman Kaul
Mr. Rakesh Pandit

without there valuable input it was never possible to do research on Kashmiri language also to site almost everything is available there

I have contacted Michael Everson from Unicode for this
and he is happy with this and going to add proposal form to this draft

I hope soon it will get included in Unicode Devanagari Plane

Monday, May 05, 2008

Google Translation tool: (My First Look)

Just saw today google translation tool, nice to see effort they have done for hindi-to-english and english-to-hindi
playing with it just 5 min gave me following results:

While doing from Hindi-to-english
i/p: तुम कौन हो -> What you have
expected result: who are you
i/p: मे घर जा रहा हु -> Hu is in the home ->
expected result: I am going home

While doing from english-to-Hindi
i/p: who are you -> आप जो कर रहे हैं
expected result: तुम कौन हो
i/p: go away -> दूर
expected result:दूर जा

so i found its accuracy something around 20-30% for sentence. It is good for single word.

while speaking about translation from english-hindi and hindi-english is really very tough job, since
1. For single english word there are lots of hindi words available so its really difficult for tool to understand, It need manual editing for improving accuracy.

2. Meaning of words keeps on changing according to statement while changing from english to hindi

3. Structure of sentence, if you observe carefully, 'is' 'was' comes in the middle of english sentence same translated things comes in the last part of hindi sentence.

4. Determining sex of sentence is a tedious task
Statement "I am going" this statement have two forms depending upon who actually speaking it
male "मे जा रहा हु" Female : "मे जा रही हु"
This can be achieve only if we know who is actually speaking it from previous statements else we need manual editing.

5. Most Important thing it need strong dictionary in the background for all these comparisons and should have all possibles words for particular word.

Good to see google tried it, some more work will definitely improve it lot!!!

Friday, May 02, 2008

Including smc-fonts package(Malayalam fonts) in Fedora 9.

I really found it useful to write here since i think its nice to see many Malayalam typefaces/fonts in Fedora 9 now. so now Malayalam users not need to install these fonts manually they can easily just do

yum install smc-fonts-*

they will get these fonts in Fedora 9

Its really worth to write here, since it was really difficult adding this package in Fedora, since there was lots of issue in this package as per (FPG)Fedora packaging guideline. and i am happy that finally i done required adjustments and finally it is in Fedora 9.

1) There was license inside fonts but not easily viewable, person should have fontforge or any other font tool to actually see that license

2) Then i got detached Licensed text from upstream in this tar ball but it not solved the problem
since this package consist fonts from different upstream project all with different License.
and as per FPG we cant make a single package having different license inside it.

3) Its really nice, alternative options is given there for almost all the situation in FPG, i studied all these things from the expertise, and finally added sub packaging support in smc-fonts and then kept it for review. Thanks to FPG for good documentation.

4) thanks to Rahul Bhalerao who actually done review for this very complex packaging example, since it was not regular package it was package with different license and different upstream versions.

you can see bug for more details

as per review suggestions i modified it and finally Jens Peterson gave cvs access and now smc-fonts is in Fedora 9.
cheers for that !!!!!!!!!!!!!!

Presently i am fedora downstream maintainer of smc-fonts package it includes fonts "meera, rachana, raghumalayalam, dyuthi and suruma"

if you found any bug in these fonts feel free to file it at

Tuesday, April 29, 2008

difference between malayalam fonts lohit and meera

here just mentioning things i notices in between these two fonts
1) both are mono thick fonts shapes are almost same
you can see the image(fontforge metrics window) below for checking shape difference, its almost null

meera fonts shapes looks bit thicker than lohit

2) this difference i noticed while comparing two fonts in open office

meera really needs some work on this, since if we set this font in GUI application at 8-9 size it is unreadable. Even if we check its point size with other Latin fonts it is very smaller almost half than other fonts.

3) the major difference is meera support traditional script whereas lohit is following new script, so you need to choose right font according to your script need.

yeah, there are possibility of some rendering differences, bugs as they are depends upon GSUB rules of fonts and underlying rendering engine, it is not in the scope of this blog.

Wednesday, March 26, 2008

Steps to test glibc sorting order of any locale

In indic-mashup we had a good discussion on sorting orders and issues of sorting of indic languages. All language experts posted thers expected data on

I have recenlty completed work for mr_IN and it is upstreamed also and now you can check sorting of mr_IN in next glibc releases

just blogging this here since it will be useful for many linguist to test sorting order for there languages and it will be nice if we can test and correct sorting order of all languages. :)
so first step to do this is test sorting and file bugs for wrong sorting order ;) as i am working on collation from some time i will surely help in fixing that

step 1: create text file for ex: barakhadi_test
step 2: write sorting data into such that each sorting syllable on one line

so content of your test file will be like this

step 3: use following command in terminal
syntax : LC_ALL="locale name".utf8 sort "path/test file name"

for marathi case it will be
LC_ALL=mr_IN.utf8 sort test.sort

it will give you output as sorted data each syllable per line in terminal

if you want to write sorted data in some file just add following line instead of above

syntax : LC_ALL="locale name".utf8 sort "path/test file name" > output_file
LC_ALL=mr_IN.utf8 sort barakhadi_test > barakhadi_sorted

there are also some other way to test but i have mentioned here method i am using to test

Thursday, February 14, 2008

How to test Open Type Fonts

Font is very important part of Operating system and 100% accuracy of that is very important.

Though i am not from QA background but after more than three years of working in this rendering issues field i have seen many varieties of font problems. I am writing some of area where we should look while testing fonts. This is just some guidelines please suggest me if any you can.

1. Cross check font with Unicode chart:

This is very important step, use tool like character map in Fedora, charmap in windows. These tools show you glyphs in font and corresponding Unicode value. Just check this with Unicode code chart There is chances of wrong Unicode value assignment to glyph and also possibility of missing important Unicode characters. Please check the Unicode version font supporting so it will be nice to check particularly with same version of Unicode chart.

2. Create a test file for your language:

The content of this file should be all possible syllables of language, say for marathi it will be barakhadi(combination of all consonant+ matras). It should also contain some conjuncts form ex. matra ligatures, consonant ligatures. Making this file is really big effort but doing so once will help you forever for testing fonts. I think there should be some available for some script already. So please surf net.

I will really appreciate people from community if they have such file please submit that file it will help lots to others peoples also.

So when you testing any new font for your language just apply that font to this file and check whether it is working properly or not. In case of any doubt you can refer to all ready available accurate font for cross checking.

Problems of this file can be, since you are not actually typing the characters, there are chances of source file might have typed using wrong typing sequence. Since we are just viewing that we cant be sure what person has entered while typing it. So i suggest please in case any doubt type actually that character sequence.

Example: Once i took some data from for testing but some characters were not rendering properly, letter on i understood there data entry operators are typing ZWJ in many unnecessary places also(almost every places where half form's of consonant comes).

3. Problems of rasterizer:

Many times it happens, due to problem of hinting rasterizer shows some wrong GPOS attachments. In this case please increase the font size and check, else just take a printout of this doc and check.

I have noticed this problem while working of nastalliq script font, cursive attachment is key point of nastalliq script and on screen it was showing break in cursive attachment but in GPOS rules of font and in print paper it was showing properly, so it was problem from rasterizer side.

4. Compatibility Issues:

This is major problem of Open Type Fonts. Open Type Fonts gsub rules give different results according to rendering engine's reordering methodology.
So even though our test file gives nice results on gedit, it might not give same results on kwrite/openoffice since they use different rendering engine.
It will not give 100% different behavior but there are chances of some bugs. So please dont forget to do this.

5. Font Styles: (Normal, Italic, Bolt and Bold Italic etc)

check is it possible to apply all styles to font (Bold, Italic, Bold Italic)
I have seen this problem while creating Arabic fonts, even though after installing for weights of fonts(N, I, B and BI). When i was trying to see actually effect it was not showing me style variations. After digging into it i understood some problem of .fog(source file) file setting. I was not updating name of TTF names field.

Testing styles is very difficult task since many editor give inbuilt support for italics, bold. So checking the glyph we are watching, is from Italic font or is this default italic given by editor is very important. I was doing so by taking printout :).
First take printout of default italic given by Editor. Then Install Italic weight of Font take printout. Now compare things will give you exact results.

6. Fonts internal name:

Whenever we double click on any True Type Font it shows us its name, license information etc. In gnome it is gnome-font-viewer, in windows windows-font-viewer these viewer shows this information.
Testing this thing is also very important. Since this is the name we identify while selecting font in various editors(OO, gedit etc).
So test is this font-name shows as expected in Editor. ex. 'samyak devanagari' while selecting font in menubar.

7. Selecting Font:

Actually applying font to text. Some time editors don't allow us to do this. I have see this problem with Ms Office. After installing a font it was not allowing me select one from list.
The problem in this case is unicode bits in OS/2 are not set properly for script.
Fontforge and Fontlab font editors do this, just we need to select required script. But people who are still using Fontographer need to give this by writing Hexa value for that script range.

8. Printing Quality:

Shapes we see on screen are result of hinting, anti-aliasing and some more processing of rasterizer also screen-resolution also effects on it.
Check by taking printout: It will shows how font looks in printed material, might be you can suggest some modification in glyphs.
Yes definitely we should check this with different kind of printer dot-matrix, laser, inkjet. It actually tell you the printing quality of your font :)
That's why you can seen many varieties of font today for different media(display, print etc)

please don't forget to mention version off all application with OS name while filing bug. Also if you are giving key name of ASCII keyboard, please mention which keyboard layout you are using ex. inscript, phonetic etc

Tuesday, January 22, 2008

Temporary Solutions for Kashmiri Problems

* All things in this blog is for Kashmiri devanagari only
Kashmiri Standardise in 2002 in CIIL by Shri. RK Bhat, but it is very sad still they don't got additional required code point in unicode devanagari code page u+0900. I have seen on this website there is lots of content available for kashmiri, basic of kashmiri, its sounds and using only these resources given in this website anybody can learn kashmiri. Going through this website person will definitely understand in unicode devanagari codepage we need additional code points for kashmiri for sounds û & ü.

Then what is reason Unicode is still not supporting kashmiri fully??
Actually earlier community people discussed with unicode, but the thing is kashmri required vowel sound û & ü are little bit same like Gurumukhi vowels u & uu, so unicode suggested to use that else, propose the same to unicode.
But that solutions has may problems , since first thing is gurumukhi shape is different than require.
and there is lots of other problems from rendering and font side. Since if we type Gurumukhi matra u+0a42 in side Devanagari text it identifies it as punjabi language syllable, recognizing as invalid syllable
and through u+25cc mark in between devanagari character and gurumukhi matra. And dont gives the required combination.
The using matras of gurumukhi will not solve the problems since corresponding vowels signs also required.

Presently for Translation work i am putting these shapes at location u+e500, u+e501, u+e502 and u+e503 in lohit and samyak fonts. Yes i know it is wrong since data getting created using these code-points will be not standardize but no option because unicode thing will take long time and we cant stop our work, so as soon as we will get code-point in unicode we will use converter or replace these with proper value.

added first character of above image on Inscript V , Third character on B
Second & Fourth Character on # & $ key

become root user
1.take backup of font - copy /usr/share/fonts/lohit-hindi/
2. paste this font at same location
3. take back-up of mim file /usr/share/m17n/hi-inscript.mim
4. paste this file at same location

just logoff and u can now type kashmiri using key #, $ and V and B