Dealing with Scoping/Validation in XText 2.9.x

In my last post I described how to cope with generic types in a meta model regarding the scoping mechanism of XText. This post is about the oddities (not meant in a bad way) of XTexts object creation strategy.

Let us take the entity model example from my last post and refine it a little bit. Since we are creating a textual DSL (domain-specific language) and not a GPL (general-purpose language), we would like to add (further) abstractions to our language. Therefore, we do not use the language itself to describe collection types, but integrate it into the language. In contrast, in Java a List is an interface with several implementations like LinkedList and ArrayList. They are all implemented in Java and not part of the language features itself. Of course, there are some general-purpose languages like Python, which integrate collections directly into the language, or others like Google Dart which semi-integrate them via syntactic sugar, but still have implemented them as normal classes. But this is not the point here: We would like to integrate collections as a concept into our language…

In the standard grammar example in the XText documentation, they add a many keyword to an attribute declaration which switches a boolean property to true in the corresponding meta model object. This is a simple solution for this problem, but it does not scale. Although, we would like to abstract from the collection implementation, we still might want to describe the characteristics of a collection type. We can add a second boolean unique for sets, which says that each element may occur only once in our collection. Next, we would like to differentiate between ordered and unordered sets. And why stop here? What about maps/dictionaries? Sooner or later we will end up with a meta model class for associations that has so many options, that it is hard to say, what it is at all. For a mapping, we even would have to add a second type reference, so that we can create a mapping with keys and values of different type. For a collection or single association, the second value would be null which introduces a further source for mistakes.

A second approach might be to bundle these additional information in classes like Collection which contains the booleans unique and ordered and so on. But this is only shifting the problem, since we need null-checks for these properties in order to identify an association.

I tried a third approach (and failed bitterly) where I introduced a corresponding type hierarchy (refer to the code in my last post):

// ...
class Association extends Property<ComplexType> {}
class CollectionAssociation extends Association {
  boolean ^unique
  boolean unordered
// ...

The corresponding grammar could be:

  (NonCollectionAssociation | CollectionAssociation) type = [ComplexType|QualifiedName];
  'association' name = ID ':'
  (  ordered ?= 'list of' 
    (ordered ?= 'ordered')? unique ?= 'set of'
NonCollectionAssociation returns Association:
  'association' name = ID ':';

You could write this in one parser rule with a so called simple action using {CollectionAssociation}, too. Another variant would be to put the assignment to type into each alternative. Consider the following example in the new entity model language:

data Text: String
type Animal {
  attribute name: Text
  association favorite: // ??
  association intolerances: set of // ??
type Food {
  attribute name: Text

If you hit ctrl+Space the corresponding objects are not given as context to the scope provider, but its container, i.e., the type definition and at this point we do not know whether we have a primitive or complex property, since types can have both. If the scope provider would get the current rule also as an input, we could decide the corresponding type because “association” has already been read in. The underlying issue is, that the first assigned action name = ID comes before the alternative decides whether we have got a single or a collection association. The difference between the variant shown above and a grammar where the assignment to type is copied to both alternatives is, that XText provides the correct reference type in the latter case, which is EntityModelPackage.Literals.ASSOCIATION__TYPE. As described in my last post this is resolved to EntityModelPackage.Literals.PROPERTY__TYPE so that mistakenly primitive as well as complex types are offered by the content assist. The former underapproximates and does not even ask the scoping mechanism for the type attribute.

Skip until next section if you cling to your sanity

I really tried to get this up and running, by introducing a new type TmpAssociation in the meta model which is used in the grammar for both collection and non-collection associations and tried to hack XText to replace them afterwards with the correct object. I did this by replacing the IASTFactory in the DataRuntimeModule. The AST factory is responsible for the creation of the objects of the semantic model. It has three methods: one for object creation, one for assigning an object to an association, and one for adding an object to a collection. Instead of creating a normal object I returned a dynamic proxy which delegates to a “normal” TmpAssociation object created by the default factory via EntityModelFactory.eINSTANCE.createTmpAssociation.

Later on, the TmpAssociation is added to the list of properties in ComplexType (see grammar in my last post). During this process I add a replacement for the temporary association (via the custom AST factory) and add this to the meta model instead of the proxy. The proxy is augmented with the concrete association object ⎯ because XText still uses this object ⎯ and all method calls to the proxy are redirected to the concrete association falling back to the temporary association only if the method is not available in the concrete object. This becomes awkward really fast because ECore internally uses its own “reflection” mechanism to get information about an object.

Let me give you an example: the temporary association needs the additional information whether it is a collection or not. In contrast, for concrete associations this has been encoded into the type. XText will ask the proxy object whether it is a collection or not. Unfortunately, it does not call isCollection(), but proxy.eClass().getEStructuralFeature("collection"). Since the concrete association has the method eClass() the proxy returns its eclass ⎯ and not the eclass of TmpAssociation ⎯ which does not have the structural feature “collection”.

Long story short, after fixing all these issues, it didn’t work as expected and during debugging I came across a TmpAssociationImpl object although I bound my custom AST factory and never returned such an object by myself. It seems as if the content assistance framework of XText creates tentative objects in some constellations… This was the point where I started to rethink my motivations 😇.

Sane-Mode ON

I came to the conclusion, that this just is not the XText way. Although I would like to integrate the concept of collections into the language, it is not necessary to encode them into the association. I ended up with a solution which I do like much better as what I tried to achieve in the first place:

interface EntityModel {
  contains Type[] types
interface Nameable {
  String name
interface Type extends Nameable {}
class PrimitiveType extends Type {
  BaseType baseType
class ComplexType extends Type { 
  contains Property<?>[] properties
interface Property<R extends TypeRef> extends Nameable {
  contains R typeRef
abstract class TypeRef<T extends Type> {
	refers T ^type
class ComplexTypeRef extends TypeRef<ComplexType> {}
class PrimitiveTypeRef extends TypeRef<PrimitiveType> {}
abstract class CollectionTypeRef<T extends Type> extends TypeRef<T> {
	boolean ordered
	boolean ^unique
class PrimitiveCollectionTypeRef extends CollectionTypeRef<PrimitiveType>, PrimitiveTypeRef {}
class ComplexCollectionTypeRef extends CollectionTypeRef<ComplexType>, ComplexTypeRef {}
class Association extends Property<ComplexTypeRef> {}
class Attribute extends Property<PrimitiveType> {}
enum BaseType { String Integer Boolean }

In line 15 we add a containment to a new TypeRef type which contains the information whether we have a collection or not and which type has been referenced. The corresponding XText grammar looks like this:

EntityModel: (types += Type)*;
Type: PrimitiveType | ComplexType;
PrimitiveType: 'data' name=ID: baseType = BaseType;
BaseType: String | Integer | Boolean;
  'class' name=ID ('{'
    (properties += Property)+
Property: Attribute | Association
Association: 'association' name = ID ':' typeRef = (ComplexCollectionTypeRef | ComplexTypeRef);
		ordered ?= 'list of' 
		(ordered ?= 'ordered')? unique ?= 'set of'
	) type = [ComplexType|QualifiedName];
	type = [ComplexType|QualifiedName];
// analogously for attributes ...

Now if we hit ctrl+Space in our entity model code example from above in line 4 or 5, the scoping mechanism will get the container of type Association for line 4 and a ComplexCollectionTypeRef in line 5. If we add a marker interface Complex to both of them, we can use the following scope provider and be happy:

class DataScopeProvider extends AbstractDataScopeProvider {
  override getScope(EObject context, EReference reference) {
    switch context {
      Complex case reference == DataPackage.Literals.TYPE_REF__TYPE: {
        val rootElement = EcoreUtil2.getRootContainer(context)
        val candidates = EcoreUtil2.getAllContentsOfType(rootElement, ComplexType)
      Primitive case reference == DataPackage.Literals.TYPE_REF__TYPE: {
        val rootElement = EcoreUtil2.getRootContainer(context)
        val candidates = EcoreUtil2.getAllContentsOfType(rootElement, PrimitiveType)
      default: super.getScope(context, reference)

Exciting 🤓!

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload CAPTCHA.