By: Andy Newton
Date: April 17, 2016
The Wikipedia entry for data definition language is: “A data definition language or data description language (DDL) is a syntax similar to a computer programming language for defining data structures, especially database schemas.” They are also referred to as schema languages.
The IETF’s JSON Working Group has discussed a JSON DDL in the past. Understandably, the discussions often invoked the scars of experience with XML’s most widely used DDL, XML Schema. The influence of XML Schema on the XML ecosystem is extensive; shortcomings of XML Schema almost always have a negative cascading impact with any XML work. But XML is not JSON, and although the complexities of XML are by-necessity present in an XML DDL, such entanglements can be avoided in a JSON DDL. I posit that a JSON DDL can be more powerful and flexible than an XML DDL because JSON’s scope is narrowed to data serialization (a markup language is much more than data serialization, as commonly used).
Are Prose and Examples Good Enough?
Some reasoning against the need for a JSON DDL, or a DDL for any data format, is that descriptive text accompanied by examples should be all that is necessary for a programmer to implement a specification, and that the overhead of a DDL complicates matters. I agree that for most simple cases this is true. But I have first-hand knowledge that not every case is simple.
RFC 7483 describes the JSON used by Registration Data Access Protocol (RDAP), by far the largest of the documents produced by the Web Extensible Internet Registration Data Service (WEIRDS) Working Group. As part of an area review, Tim Bray wrote this of RFC 7483:
“Speaking as a person who’s been skeptical of JSON schema efforts, it pains me to say this, but the information about large-scale message structure is scattered through this document in a diffuse way and it’d make me nervous as an implementer whether or not I was getting it right. I think it might be helpful to have a “large-scale message structure” section that quickly runs through the allowable top-level shapes of messages, and exactly what can be nested inside what.”
Despite RFC 7483’s extensive prose describing JSON and its copious, multipage examples, a DDL would have really been helpful. (For the curious, https://datatracker.ietf.org/doc/draft-newton-rdap-jcr/ shows what a DDL can do for RFC 7483).
In my opinion, Tim Bray’s precognition was borne out during the several interoperability sessions held for RDAP at the IETF. Some implementers made assumptions about the data structures that were incorrect. And in a few cases, we found that the examples were incorrect.
In addition, I drew another conclusion from my experiences with the RDAP interoperability tests: textual descriptions are not as easy to read for the many programmers who are not native English speakers. Even I often succumb to the TL;DR (“too long; didn’t read”) nature of specifications. The tediousness of prose could be worse for them, and the precise-ness and conciseness of a DDL might be more helpful.
Testing and Test Software
Bad experiences with XML Schema have led some to believe that DDLs often focus implementers on correctness of the XML document to the detriment of other aspects of interoperability. “Just hand me the XSD (XML Schema Document). I don’t need to read the specification.” Indeed, this is one of several reasons I personally switched to Relax NG for all my XML work.
But this aspect of XML Schema is not a universal constant with all DDLs, not even all XML DDLs. It is not even a function of XML Schema, but rather a so-called feature of the tooling and the push-button, code-generation development frameworks that abstracted away all protocol aspects from the programmer (environments popular with XML technologies such as Simple Object Access Protocol).
Negative experiences aside, DDLs can be an important part of interoperability testing. DDL validators aid the creation of test suites, knocking out the low-hanging fruit with regards to syntax.
Further still, some DDLs such as JSON Content Rules (JCR) contain features to aid the creation of specific test cases: is a value Y under condition X. The nature of JCR accommodates locally overriding rules to a narrower definition (e.g., specific constants or ranges). Writing a test can involve a simple rule change instead of tediously traversing nested data structures to access the value to be inspected. (Bias warning, I am one of the coauthors of JCR).
DDL validators, or schema validators as they are sometimes known, also have the benefit of helping implementers develop software as specifications progress through the standards process. For example, during the standardization of RFC 7484, I was able to provide valuable feedback to the specification authors—feedback that was vital to the performance of software using the specification. At some point during the standardization process, however, a small, unsubstantial change was made to the JSON that I had not noticed. The result was that my software would not interoperate despite the many, many unit tests I had written. Had RFC 7484 used JCR or JSON Schema, I could have easily dropped in the final DDL and quickly discovered the problem.
Desired Features of DDLs
For the purposes of the IETF, some DDLs are more practical than others. When writing a specification, one aspect of a DDL that is a benefit is conciseness. Internet Drafts have many sections and seldom pass muster without explanatory text. Therefore a DDL that does not add bloat is appreciated. While conciseness can sometimes reduce readability, for complex uses “TL;DR” is much more of an issue. As a specification writer, if you feel writing a computer language can be tedious, the same is probably true for the many readers of the document.
Figure 1 shows JSON used as an example in RFC 4627.
And now let’s examine two different JSON DDLs describing the aforementioned example: JCR and JSON Schema. Figure 2 is the JCR example, which has a more concise syntax .
By contrast, in Figure 3, the JSON Schema is more verbose.
DDLs, such as XML Schema and JSON Schema, use the syntax of their respective formats to construct rules. This has the benefit of easing implementation of their DDL validators but jettisons conciseness. These forms also make it difficult to inter-leave instructive prose as normal draft text between the DDL rules, a habit of IETF authors familiar with notations, such as Augmented Backus–Naur Form (ABNF).
Figure 4 is an excerpt from RFC 4287 (The ATOM Format). ATOM is an XML format, and RFC 4287 uses the Relax NG Compact Syntax to define it. RFC 4287 is well written and makes good use of mixing explanatory text with formal syntax rules.
Another common usage with ABNF is to reference rules across documents. This promotes reuse and reduces error. Figure 5 is an excerpt from RFC 6270 (the tn3270 URI scheme). It references back to RFC 3986 for a normative definition of the ABNF rule for authority.
A DDL with this feature provides the same benefit as we see with ABNF. And while this is also possible with prose, it is much more precise and concise when referencing specific rules.
As of this writing, there is no standard for a JSON DDL. Having one (or more—there is no harm in giving specification authors a choice) would benefit software developers when they write test suites. It would also make for better RFCs, as definitions would be more precise. On top of this, I believe JCR has many properties that flow more naturally with the style in which RFCs are written.
If you are writing or plan to write a specification using JSON, I invite you to take a look at both JCR and JSON Schema.