An EA (Extended Attribute) Primer

Introduction

Welcome to my EA primer. This blurb is to get you up to speed quickly for writing applications that use EAs (as extended attributes are affectionately known). I have made every effort to make these easy to use so you will use them.

Why Do I Need EAs?

Often time applications need to store information about a file. For example, if you wrote a work processor then you might want to record the last location of the user someplace. One way to do this is to keep such information in the file. This often is a bad idea, since you might need some of the information before loading the file, causing a conflict. Another way to do this is to have a big properties file or maybe several smaller files with the information. In a large-scale system this can become quite a management issue and if much information is stored there, performance is apt to be slow. Not to mention that an error that damages that file loses all the information for every file. Storing this in lots of little files means that there will be an explosion of these and this also exposes this information to user tampering – normally resulting in a perplexed user who deletes 'all those extra files' and wonders why all his or her customized settings have simply vanished.

Both of these lead to more headaches than they solve. The best way would be to make it part of the file in question. Then there is never a question about it being there. If the operating system manages it, performance needn't be an issue. This is exactly what an EA is: EAs are system-managed metadata on files and folders

EAs in their most simple guise are just pairs of keys (the name) and values. In this way, they operate a lot like java Properties objects, if you know what those are. What sort of information can you store? Pretty much anything on your computer. The only real limits are size, since the total size of all EAs cannot exceed 64k. The other limit comes from exceeding the amount of space set aside for names and such. These are hard limits imposed by OS/2 and there is nothing that can be done.

Some Examples of EAs in OS/2

EAs are used all over the place in OS/2. The Workplace Shell (WPS) maintains several of them, such as for icons and their positions. Applications can register a type for themselves and every file created of this type just knows then what applications it can be opened with. This behavior is much more rugged than using file extensions. Other famous examples are in REXX. Did you realize every time you run a command file after a change to it that it is 'tokenized' – an intermediate compilation – and these tokens are stored in as EAs? This is why the first time a command file is run takes much longer than subsequent calls. EPM, the much beloved editor saves its formatting information in EAs. Haven't you wondered how you can take a plain text file and set different fonts, colors and other such details without actually having to save the file in some odd binary format? All of these are common examples of extended attributes.

If you like to look at EAs directly, there is little known switch on the type command that will let you do this. You can therefore get independent verification that your changes are taking place, if you don't feel like repeatedly opening the settings notebook. The syntax is

type -ea:filename

This will spit it all out to the command line. If you need to save it to a file you will have to redirect it, e.g. to send it to the file foo.eas you would issue

type -ea:filename > foo.eas

EA Basics in C

(You really only should consider reading this section if you have worked with EAs in C and want a little more to orient yourself.) We recall the following from the C API. EAs have the following: Really only a name and critical flag need be worried about. The numeric types are recorded here mostly for programmers who are familiar with the C API and want to understand how it works. In Java this is all done by objects, so if you, for example, make an IconEA object, it will already have the correct type of EAT_ICON and the correct name, .ICON. So what Java classes do you need to be aware of? Read on.

The Four basic types

There are four basic types. The simple types which have a single value consisting of string (also called text) or binary (also called a byte array, since it is), and the compound EAs which are lists of simple EAs. The compound EAs are either of mixed type (MVMT, for 'multi-valued, multi-typed') or all of a single type (MVST, 'multi-valued, single-typed').

String or Text EAs

This is represented by the Java class StringEA. It has a value that is a string.

Binary or Byte Array EAs

This is represented by the Java class ByteArrayEA and has a byte array as its value. Note that if the jWPS gets an EA and can't understand it, you will get one of these. This is because really the way things are stored by OS/2 is by keeping some accounting information (the name of the EA, its numeric type) and a byte array. It always works to get a byte array EA. Remember that there is really no low-level checking done by OS/2 on the contents, so applications can (and do) write their EAs wrong, making it impossible to determine what was actually meant. In that case all that can be done is to do what we do here – return the whole thing.

Multi-valued, Single-type (MVST) EAs

This is represented by the Java class MVSTEA and represents a list of EAs all of which are the same type. After the first entry is made (using the addEA method) all subsequent entries must be the same type or an exception will be thrown. These may be nested, so you can do something sneaky like have a single MVST consisting all of MVMTs, each of which has completely arbitrary content.

Multi-valued, Multi-type (MVMT) EAs

This is represented by the Java class MVMTEA and represents a list of EAs. Any other EA can be in this EA, including other MVMTs. These can be nested as deeply as space permits.

The standard EAs

OS2 provides several built-in EAs. By convention, all system-defined EAs start with a period. Again, nobody will whack you in the head if you decide to make an EA called .MINE, but system EAs are used a good deal and if you try to commandeer one (such as deciding that you and you alone should use the .ICON EA), you may get 'unanticipated but predictable results' as they say.

Making your own EAs

Extending the basic types

The best way is to extend one of the basic four types, giving your EA an easily recognized name. (Note to folks who are used to the C API: you don't need to worry about the numeric type at all. You should be able to do pretty much anything you want with EAs without ever having to deal with one of these.) So for example, let us say that you wanted to make an EA that stored a thumbnail of the file. A good idea is to extend ByteArrayEA and give it a name, say we'll call it thumbnail.ea. It might look like this:
   import net.jqhome.jwps.ea.*;
   
   public class ThumbnailEA ByteArrayEA{
      // this will have the name and other information set automatically
      public ThumbnailEA(RawEA rawEA) throws JWPException{
         super(rawEA);
      }
      
      // another constructor, for making them directly. This sets the name
      // and the data.
      public ThumbnailEA(byte[] data) throws JWPException{
         super("thumbnail.ea", data);
      }
      
      // add any methods to, for instance, resize the thumbnail, rotate it, ...
      
   } //end class
   

Making your own factory

After you have written your own EAs, you might want to have them made by the factory. If you don't have anything specific required – such as having your own extremely specific binary format or some such – then you don't need to make a factory. Our previous example falls in this category. Any EA that extends a basic EA but just sets the type, which should be most all of them, falls under this heading.

On the other hand, let us assume that our EA requires some very special handling. Maybe we are compressing and decompressing the information on the fly. In this case, you would want to have you specific EA made automatically. For this you need a factory. So, assuming you need a factory, this section tells you how to do it.

The factory class, EAFactory, is called by the native code. In order to change its behavior, you must follow these steps:

A good strategy to follow in your method is to check for and create your EAs and if that fails then to pass off creation to the original factory. If you don't do this, then you have to write code that generates all the other possible EAs. That's a lot of work. Trust me. So here is an example of such a class.
       import net.jqhome.jwps.ea.*;
       
       public class MyEAFactory extends EAFactory{
       
          public static AbstractEA newEA(RawEA rawEA) throws JWPException{
             if(rawEA.getName().equals("THUMBNAIL.EA")){
                return new ThumbnailEA(rawEA);
             }
             return EAFactory.newEA(rawEA);
          } //end newEA(RawEA)
          
       } //end class
    

Here it is assumed that you have a class named ThumbnailEA and that it has a constructor that will accept a RawEA object.

Using XML with EAs

I have to admit it. When I first used OS/2 oh so many years ago I thought that EAs were the coolest idea I'd seen. Then I found out that there was really almost no way to create them. Sure I could go to the settings notebook and set the standard ones, but it bugged me that I couldn't use a text editor in some way to edit them. Eventually the world caught up with me (hee-hee) and came up with XML. If you are unfamiliar with XML you should learn about it. It is not hard. Part of jWPS is that it can take the EAs and serialize them, which is a fancy way of saying that it will change them into XML format. The way XML works is by having a special tags of the form <tagname> which show what parts of the text represent. If you are familiar with these, this is given in a dtd (look at ea.dtd in the main distribution). If you don't know much about XML, this is a really simple application of it and you can probably just get by with a bit of mimicking.

There is a handy-dandy class called EAUtil that can be invoked with a command file, jEAUtil.cmd. This ships with the standard distribution of jWPS. If you want to see the EAs for a given file, (do a directory listing and see which files have EAs, choose one of those), say something like myfile.dat (actually most .cmd files have EAs, although these are binary), then you can put the EAs into the the myfile.eas, issue

jEAUtil -s myfile.dat myfile.eas

You can also create a test file, open the settings notebook and add comments, a subject line, keyphrases etc. then issue the above.

The basic format

Here is what a complete small example looks like. This sets the subject and keyphrases of a document. This is in the file ea-tmpl.xml that comes with the distribution.
<?xml version="1.0" encoding="UTF-8"?>
<eaList>
    <ea name=".subject" type="EAT_ASCII">
        <value>Put your new subject here...</value>
    </ea>
    <ea name=".keyphrases" type="EAT_MVMT">
        <ea type="EAT_ASCII">
            <value>the first keyphrases line</value>
        </ea>
        <ea type="EAT_ASCII">
            <value>the second keyphrases line</value>
        </ea>
    </ea>
</eaList>
So what does this mean? The first line is required so it can be recognized as an XML document. Don't touch it. The outermost tags are <eaList> and show that this is a simple list of EAs. That is also manditory. Every EA list contains EAs. EAs contain values (for text and binary EAs) or other EAs. In this case, the subject is a single line of text. The type must be preserved, incidentally. Normally this is not too much of an issue and only comes into play really when some application has written corrupt EAs and this serves as a double-check. If an EA contains other EAs, then is should be of type EAT_MVMT, and you simply add them. These can be nested as deep as you like.

To try this out, go to the distribution and make up some random file, call it fnord.txt. You can set the subject and keyphrases by issuing

jEAUtil -a fnord.txt ea-tmpl.xml

Open the notebook for fnord.txt and go to the file page. Cool, huh?!

Binary EAs

The next thing you might want to do programatically is set the icon for a file. This is really easy too. Binary data, of course, usually can't be moved about, but it can be encoded into a text stream. I will leave out the well-known technical details, but this is base 64 encoding and is standard. Most email attachments are sent this way. There is a utility, b64.cmd that will take a file and encode it that is to say convert it from binary to text or decode it, from an encoded form back to binary. Let us say you want to carry this out on the file myicon,ico, saving it in myicon.b64. You would issue

b64 -encode myicon.ico myicon.b64

Look at the contents of myicon.b64 with a text editor. You can paste it into the XML file and use it to set the icon of your file. That's what we are going to do now. Here is the XML to do just that. This is located in the file ea-tmplb.xml
<?xml version="1.0" encoding="UTF-8"?>
<eaList>
    <ea name=".ICON" type="EAT_ICON">
        <value encodingType="base64">QkEoAAAAeAAAAAAAAABDSRo (WHOLE BUNCH OF STUFF OMITTED!!!)</value>
    </ea>
</eaList>
Here I cut an pasted this from one of the standard icons supplied with OS/2. To change the icon, issue

jEAUtil -a fnord.txt ea-tmplb.xml

You might have to right-click on the notebook to get the icon to refresh.

Final notes

Don't Get Carried Away

You can have as many EAs in an XML file as you want. Now, that said remember that OS/2 only allots 64k. It is easy to get carried away with putting in all sorts of things. Using metadata (which is what EAs are, see the note in the introduction to EAs) should be fun. At this point these tools are here just to get people interested in EAs and jWPS, but their utility is still beyond question.

What if you really need to have megabytes of data in an EA? Well, you could just keep the data in a file and put the name of that file into an EA. It so happens that the exported versions of the EAs in XML do not suffer from the limitations that EAs do, so this might be a good thing to consider. Just pull in the EA as needed. Heck, you could even make your own FileEA class that does this automatically....

What Are Bad EAs?

If you are using EAs you might at some point run into something called a BadEA. What are these? OS/2 manages its EAs strictly on the honor system – so the programmer is always right and if someone is not careful when saving EAs garbage is written. Since at a very low-level the control information (how big the EA is, its numeric type etc.) is just part of the data this means that this can get badly munged. In extreme cases, all the EAs past a given bad one can't be decoded by any means other than snooping through them with a hex editor and looking for recognizable strings. I've run into these a few times. So, trying very hard to be a good citizen of the WPS, the EA utils will return a bona fide BadEA object which is a type of binary object. What should you do with these? The conservative thing to do is just leave them alone. This is the usual behavior we adopt, since even though we can't make heads or tails of such an EA, it might be critical for some application and altering it might well render the application unusable. If the vendor has been even moderately responsible the name of the EA should give some clue as to the origin and then you can decide on a course of action, such as removing the EA if that specific program is no longer in use.