How to render Dzongkha texts in Swing properly, take II

Some time ago I wrote an article about problems with Dzongkha (Bhutanese) texts in Swing environment. At that time I thought it was fixed for good but nothing was further from the truth…

We’re not living in an ideal world where everything just works and if not it can be fixed in no time. Not at all. Some time ago I wrote an article about problems with Dzongkha (Bhutanese) texts in Swing environment. At that time I thought it was fixed for good but nothing was further from the truth.

While rendered texts looks all right on first sight we later found out that some of the characters were not rendered properly. Sometimes. I did more research and came to conclusion that characters composed from more “strokes” seem to be put together in a wrong way. But – properly rendered combinations do exist in fonts as one character too – so you could’ve two similar characters, one apparently correct and the other one fault, written next to each other in the same editor.

As OS/fonts are usually the usual suspect in those cases, I started investigation there – but it was a dead end. Even notepad/kate rendered composed characters properly. I went more low-level and tried to render it using WinAPI calls – again flawless. At this moment I was thinking about writing a native library for font rendering, but I’d need to have three of them (Windows, MacOSX, Linux) and it’s not exactly piece of cake. As a last experiment I tried to render it in SWT – it worked. Well then, I thought, this is going to be PITA as well, but still better than native library.

I spent two days analysing SWT and Swing sources, looking for differences, trying to find real issue behind Swing’s unability to render texts properly. And guess what – I was successful.

For tests I’ve selected character composed from four parts – 0x0f62, 0x0f92, 0x0fb1 and 0x0f74 (ICU browser), if you have Tibetan script support you should be able to see it directly as རྒྱུ. If not see screenshots in next paragraph.

Following image illustrates the issue – on the left there’s a properly rendered character, on the right bad one. Unfortunately the bad variant is what you get if you just put it into JTextPane.

left_or_right

On top of that I found out that none of Jomolhari fonts works in Java properly. The only font able to render my test character properly was Tibetan Machine Uni.

So how do you get properly rendered Tibetan script in Java? There are more steps required:

  1. download and install Tibetan Machine Uni
  2. edit fontconfig.properties as described in first article about Dzongkha How to render Dzongkha (Bhutan) texts in Swing properly using Tibetan Machine Uni instead of Jomolhari
  3. add property “i18n” with value of true to every JTextComponent’s Document you want to display Tibetan script properly

To give you some basic idea here is the code used to produce image above:

package dzongkhatest;

import java.awt.Dimension;
import java.awt.Font;
import javax.swing.BoxLayout;
import javax.swing.JFrame;
import javax.swing.JTextPane;

public class DzongkhaTest {

    static class Frame extends JFrame {

        public Frame() {
            setTitle("Dzongkha rendering test");
            
            getRootPane().setLayout(new BoxLayout(getRootPane(), BoxLayout.X_AXIS));

            JTextPane pane1 = new JTextPane();
            pane1.setFont(new Font(Font.SANS_SERIF, Font.PLAIN, 100));
            pane1.getDocument().putProperty("i18n", true);
            pane1.setText("\u0f62\u0f92\u0fb1\u0f74");
            getRootPane().add(pane1);

            JTextPane pane2 = new JTextPane();
            pane2.setFont(new Font(Font.SANS_SERIF, Font.PLAIN, 100));
            pane2.setText("\u0f62\u0f92\u0fb1\u0f74");
            getRootPane().add(pane2);

            setSize(new Dimension(200, 200));

            setDefaultCloseOperation(EXIT_ON_CLOSE);
        }
        
    }
    
    public static void main(String[] args) {
        new Frame().setVisible(true);
    }

}

This solution seems to work fine on Windows and Linux. Nothing is perfect, however – it fails miserably on MacOSX – every composed character is shown as it’s parts one next to each other;(

Logging and OOME

We recently replaced our in-house logging for something more “standard”; log4j-based based logger. The original code developed by out former employee was a mixture of spaghetti code and huge amount of non-threadsafe WTFs….

Following text comes with limited warranty – I’m not saying that what we did was the best solution to problem but it was the simplest one. At least we thought so.

We recently replaced our in-house logging for something more “standard”; log4j-based based logger. The original code developed by out former employee was a mixture of spaghetti code and huge amount of non-threadsafe WTFs. We spent some time fixing it but finally we’ve decided to replace it. In order to incorporate our additional features that had been part of the original logger, we extended java.util.logging.Logger. So far, so good; but then the real hell started. There comes the first mistake one should learn from – never ever replace anything that works with something that hadn’t been even tested properly.

After a while with new logger we noticed our server application started to die with OutOfMemoryException. As there was a lot of changes made everywhere no one really suspected logger. But nothing could be further from the truth; wrong usage of ByteArrayOutputStream which was never reset() caused the trouble, we found that after couple of hours of profiling as no one would suspect “runtime” classes to eat the memory.

Having fixed something so obvious we thought we were done. Well, not really. Server did not crash in two days, it went down after a week. Same exception, same source – new logger.

In order to explain the situation I have to start with out new features. We wanted to have similar log files separated by given parameter, all our topics are members of huge enum, some of the values can have an annotation specifying destination filename and default parameter. If such parameter is present in log record, it’s written in file based on given data; this helps us to sort data according to their source, even if log entries are created by the same code; quick example will explain it faster – imagine application reading data from email accounts – such an application would log with our logger to N+1 files named:

application.log
application_HOTMAIL.log
application_GMAIL.log
...

I guess you got the idea. In order to do that we needed a place where those parameters are stored and instances of java.util.logging.Logger class seemed as a logical place. We extended this class and our implementation added support for those additional data.

Simple, but… there comes the catch. Loggers live in hierarchical structure, each logger has a parent. Whenever new logger is created it’s assigned to a parent (parent could be root logger, of course). Each logger keeps its children in a java.util.List, see following excerpt from Sun’s Logger.java:

    private Logger parent;    // our nearest parent.
    private ArrayList kids;   // WeakReferences to loggers that have us as parent

    public void setParent(Logger parent) {
	if (parent == null) {
	    throw new NullPointerException();
  	}
	(...)
	doSetParent(parent);
    }

  private void doSetParent(Logger newParent) {
	synchronized (treeLock) {
	    // Remove ourself from any previous parent.
	   (...)

	    // Set our new parent.
	    parent = newParent;
	    if (parent.kids == null) {
	        parent.kids = new ArrayList(2);
	    }
	    parent.kids.add(new WeakReference(this));
	}
    }

There are weak references, good. But wait – who’s cleaning up kids collection? Nobody? There we go…

This class is a joke. All important fields are private, it’s not implementing any interface, there is no simple way how to fix the problem. As a logger I wanted to get rid of myself when I’m finalized – no way, you can’t even call setParent(null), it’s gonna throw an exception and fail to do anything else. Saddest thing of all is that kids collection is not used anywhere for anything reasonable – it’s just sitting there eating heap. And you can’t do anything about it. Damn you, Sun.

How to render Dzongkha (Bhutan) texts in Swing properly

As you might know we’re developing a specialised information system. It’s been widely used in Asia, in fact one of our first international customers was (and still is) located in India. Thanks to this geographical variety we’re facing issues with exotic language/script support quite often. Last time we had an issue with Urdu script, that’s been used in Pakistan. It’s still not supported in mainstream operating systems (first support came with Vista/Office2007), a special commercial software is needed for that, it doesn’t to seem to adhere to Microsoft (or any) standards, it sort of works but there are still issues. The biggest problem with Urdu is it’s height which can really vary depending on what’s written

Yesterday I was asked about Dzongkha support. Dzongkha is the official language of the Kingdom of Bhutan. It’s using Tibetan script for written form. The good news is that there are free fonts available. I installed one of them (Jomolhari) and started experimenting with it. Just to run into complete failure – no part of the system was able to show anything else than “boxes” – at least on Windows I started with. There were only two parts of the system being able to show Dzongkha properly – but only thanks to their ability to change font used for them.

Imagining how complicated/annoying it would be to change all Swing controls to use Jomolhari font (ie. adding configuration, making sure that setFont() is called on every single JComponent, testing it….) made me feel really bad. On top of that latin script rendered by Jomolhari was really ugly (especially in small sizes like 10, 12pts).

I remembered that there was a way how to specify font substitutions in Java fortunately and after couple of minutes I got the right article http://java.sun.com/j2se/1.5.0/docs/guide/intl/fontconfig.html on Sun’s website.

I started reading the document but even after finishing I was not exactly sure what should be done – there’s a paragraph about substitution, about character subsets but – Tibetan, Dzonghka was not there. Having studied Unicode pages I knew that Tibetan characters are occupying their designated range starting with 0x0F00 yet there didn’t seem to be support for this subset. I tried adding line with:

sansserif.plain.tibetan=Jomolhari

but it didn’t seem to change anything, boxes again. While getting ready to leave the office I got one last idea – what if China somehow owns Tibetan range? They keep claiming that Tibet belongs to them, so why not the Unicode range too? So I changed line with Chinese subset:

sansserif.plain.chinese-ms950=MingLiU

to Jomolhari font

sansserif.plain.chinese-ms950=Jomolhari

and – voilà, there we go – it worked as a charm. But – what if this disables Chinese, I thought. Quick test revealed that my worries were needless, Kanji characters worked too. And latin as well, rendered in usual quality.

So, what exactly is needed? Just follow steps below:

  1. Install appropriate fonts (ie. Jomolhari)
  2. Locate fontconfig.properties.src ($JAVA_HOME/lib)
  3. Rename it to fontconfig.properties (remove .src suffix)
  4. Replace all occurrences of .chinese-ms950=(P/MingLiU) with .chinese-ms950=Jomolhari
  5. Quit all Java processes and start your Swing application again
  6. Every single Swing component should be capable of displaying Tibetan script properly – if not check if you’re using “logical” fonts and not forcing physical ones (there are five logical fonts, read the Javadoc for Font class)

Having solved problem on Windows I moved to Mac – just to see that on MacOSX Leopard (10.5.6) fonts for Tibetan are installed by default and working in Java out of the box. Good work, Steve.

JTextPane and “continuous” cursor

Well,
it’s not easy to work with Swing components. I’m working on a special editor, it consists from N JTextPanes glued together. They resize according to amount of text they have inside on order to avoid scrollbars. So far it wasn’t so complicated. But now I got complicated – I wanted to be able to “walk” freely through the JTextPanes – using only keyboard arrows.

First I implemented jumping between current and next/previous with TAB key, this was not a big deal. What’s the problem with arrows? Well – in order to have it working as expected you need to be able to find out whether you’re at the last line or not. One would guess that reading document structure would work but if you have lines longer then JTextPane width the “internal” lines and visual ones differ. One “internal” line could split into multiple visual lines.

In cases like this I simply go and study – study the libraries – this kind of check needs to be somewhere, right? So we only have to find it. After half an hour of searching I found something that could work:

int newpos = text.getUI ().getNextVisualPositionFrom (text, pos, Position.Bias.Forward, direction, bias);

(complete reference for UI is here:
http://java.sun.com/…/MultiTextUI.html)

text is JTextComponent, pos is current position, Position.Bias defines the direction (back/forward), direction is int defined in SwingUtils (SOUTH, NORTH, etc…) and bias is Bias[1] returned back. I don’t care about Biases – I just call it with desired direction and get new position. And if it’s the same like the original one we probably hit the border. Voila!

Beware of null #TEXT nodes

I recently stumbled on NullPointerException in Transformer.transform (Source, Result) method, my application was working fine until yesterday, there was no update, no change in configuration of anything…

What I was doing there is a textbook example of usage of Transformer class:

	    Source source = new DOMSource (doc);
	    Result result = new StreamResult (new File (file));
	    Transformer transformer = TransformerFactory.newInstance ().newTransformer ();
	    transformer.transform (source, result);

Transformer is a part of JVM and it’s used mostly for saving Document to file, so first thoughts were to check the parameters passed. Yes, there was a valid path inside File, canWrite () returned true, the file was created indeed (albeit having zero size) but that was all. The only message I got came to stderr and read:

ERROR: ''

Very helpful, isn’t it? What next? I went to check the stacktrace:

javax.xml.transform.TransformerException: java.lang.NullPointerException
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transform(TransformerImpl.java:651)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transform(TransformerImpl.java:281)
at xxx.Main.writeDocumentToFile(Main.java:151)
at xxx.Main.main(Main.java:332)
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xml.internal.serializer. ToUnknownStream.characters(ToUnknownStream.java:312)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:229)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:121)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:85)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transformIdentity(TransformerImpl.java:596)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transform(TransformerImpl.java:642)
... 3 more
---------
java.lang.NullPointerException
at com.sun.org.apache.xml.internal.serializer. ToUnknownStream.characters(ToUnknownStream.java:312)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:229)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:121)
at com.sun.org.apache.xalan.internal.xsltc.trax. DOM2TO.parse(DOM2TO.java:85)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transformIdentity(TransformerImpl.java:596)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transform(TransformerImpl.java:642)
at com.sun.org.apache.xalan.internal.xsltc.trax. TransformerImpl.transform(TransformerImpl.java:281)
at imdbparser.Main.writeDocumentToFile(Main.java:151)
at imdbparser.Main.main(Main.java:332)

Fortunately I’m not afraid of going deep into the code, so I grabbed sources and checked the line where everything got started:
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xml.internal.serializer. ToUnknownStream.characters(ToUnknownStream.java:312)

the code there looks like this:

    public void characters(String chars) throws SAXException
    {
        final int length = chars.length();
        (...)

Gotcha! The only possible cause of that is chars being null, fine but… why? Previous line in stacktrace finally helped to resolve the problem:

case Node.TEXT_NODE:
	    _handler.characters(node.getNodeValue());
	    break;

So there’s one or more #TEXT nodes in the document having null value! It was quite easy to find the problem having this information, it was on following ternary expression:

m != null ? m.name : getMName (files[i].getName ())

At first sight it looks good – there is a test on null value but what if mis not null and m.name is?

The Owls Are Not (always) What They Seem (Twin Peaks)