We have successfully performed a first pass at converting the entire Wa Bible from the Word format provided by Pastor Joseph's team into USFM format.
The USFM "markup language is a special notation for identifying the components and structure of an electronic document".
An a example of USFM is as follows:
\id JHN
\mt1 LAI YOHAN
\c 1
\s Gumlox Im: Ju
\r LAI YOHAN-1
\p
\v 1 Yam jah koe: Gumlox, Gumlox an ot mai: Siyiex, Gumlox an
mawh Siyiex heue.
\p
\v 2 Yam jah Nawh ka ot mai: Siyiex.
\p
\v 3 Pa koe: kuceu ku cawng koe: kheu yuh Nawh, mai: pa koe:
pa ang Nawh yuh ang koe: tix ceu kawx heue.
\p
\v 4 Kadaux Nawh koe: Ju Pa Im:, mai: Ju Pa Im: an mawh Pa
riang kawn: pwi heue.
\p
where "\c" defines the Chapters, "\v" the verses, "\p" paragraph markers, etc.
We have generated an automatic conversion program which takes the Word documents and generates the appropriate USFM format.
Once in the USFM format, the conversion of the raw text into the fully formatted Bible in PDF format is a simple question of running a conversion script which makes use of XeTex macros, there the definition of the layout is defined in configuration scripts. Changing the layout/font size/etc is a question of changing the correct entry in the script. Knowing which is the correct entry is a different question we are working on (i.e. learning...).
The currently generated PDF is using default layouts and fonts which produces over 2600 pages. This will need to be reduced to around 1200-1300 pages.
A test version of the PDF of Matthew is here
During this conversion process, a small number of issues were detected which we have now given back to the translators.
The current list of errors is here.

Post new comment