Last updated: 2013-04-30
This document describes CatoXML, which is a number of inline semantic metadata
extensions to “HouseXML” in the
“HouseXML” is an unofficial term for the XML schema of legislation drafted by the United States Congress (House and Senate) and documented at xml.house.gov.
These metadata extensions are collectively called “CatoXML”.
cato: is bound to namespace
Attribute names are prefixed with
@foobar for an attribute
A metadata element is an element that expresses metadata about a span
of text. CatoXML defines four metadata elements:
HouseXML elements can also express metadata equivalent to CatoXML
Used to contain text that creates an entity. Any child metadata elements are properties of the immediate parent entity.
@entity-type: Required. States the type of the entity. Valid
law-citation: Parallel Law Citation: used to contain multiple law citations which are equivalent.
auth-interpretation: Desired interpretation of a passage by Congress.
auth-auth-approp: Authorizations of Appropriations (species of "Budget Authority")
auth-approp: Appropriations (species of "Budget Authority")
types are currently unused, but these names are reserved for future use.
Used to contain text which is constitutive of an entity but which is not itself an entity or reference to an entity.
cato:property element must be contained by a
@name: Required. States the name of this property. Property names
are specific to a certain entity type. Two property names are defined:
funds-source: used to contain the source of funds for an
purpose: used to contain the purpose of an authority entity (
@value: States the machine-readable value of this property. If the
property element contains text, then this attribute contains a
normalized, machine-readable version of that text. If this
attribute is omitted, then the value of this property is the text
content of this element and it is not required to be
Used to contain text that indicates the amount of funds made available
and the year during which those funds are made available by an
authority entity. An authority entity may have multiple
This element exists as a shorthand for document markup to avoid the
need for id references and empty elements for one or another of its
property values. It expresses the same information as the following
<entity entity-type="funds-and-year"><property name="amount" value="1000">$1000</property> in <property name="year" value="2011">2011</property></entity>
@amount: Required. States the amount of money in US dollars that the authority proposes to be set aside. This attribute’s value is a positive integer or the special value
indefinite, indicating that no specific amount was named.
@year: Required. States the fiscal years during which the stated
amount may be spent. This attribute’s value is a set of fiscal
years expressed as one of the following:
@amountis appropriated once to be spent during the indicated year.
2012,2013,2014) indicating that the
@amountis appropriated again at the beginning of each listed fiscal year. This syntax is equivalent to using multiple
cato:funds-and-yearelements with a single fiscal year for each one.
2013,..) indicating that the
@amountis appropriated at the beginning of the first indicated year and re-appropriated again at the beginning of each following year in perpetuity.
2012..2014), indicating that the
@amountis appropriated once at the beginning of the fiscal year on the left-hand side and is available to be spent until the end of the fiscal year on the right-hand side. For example,
<cato:funds-and-year amount="100" year="2012..2014"/>indicates that $100 is made available at the beginning of the 2012 fiscal year and is available until the end of the 2014 fiscal year.
2013..), indicating that the
@amountis appropriated once and is available until it is expended.
Used to contain text that refers to but does not create an entity.
In addition to
@entity-type, one and only one of the
value attributes are required.
@entity-type: Required. States the type of entity that the
enclosed text references. Valid values are:
federal-body: Federal organizational unit citation, including Agencies and Bureaus. Uses the
committee: Congressional Committee citation. Uses the
person: Federal elective officeholder citation. Uses the
act: Popular name citation. Uses the
uscode: US Code section, chapter, or appendix citation. Uses the
public-law: Public law citation. Uses the
statute-at-large: Statutes at Large citation. Uses the
entity-id: States the id of the entity that the enclosed text references.
Entity ids must be unique among all others with the same entity-type.
entity-parent-id: States the id of the parent entity of the entity that the enclosed text references. This attribute is used when the entity does not have an id or its id is not known but a parent entity is known.
value: Expresses the content of the text of the entity-ref (not of the entity) in a consistent, documented, machine-parsable format specific to its entity-type. Different
valueattribute values may refer to the same entity.
proposed: States whether the current entity reference is to an existing or a proposed entity. The value of this attribute is
false. If this attribute is absent, then the value of this attribute is
false. This attribute may be found on uscode or act entities.
statute-at-large entity-types lack
@entity-parent-id attribute because:
@valuevalues may reference the same entity. This is unlike an
@entity-id, where every entity has exactly one id.
All entity-ref value attributes use a series of slash-delimited
segments. For example,
usc/1/234 cites title 1, section 234 of the
U.S. Code. This is equivalent to "1 U.S.C. 234" in the common citation
format. The meaning and parsing of individual segments is determined
by the value of the first segment.
U.S.C. Section. Segments are:
usc/1/2/a/icites title 1, section 2, subsection 3, paragraph a, subparagraph i. It is equivalent to "1 U.S.C. 2(a)(i)" in the common citation format. The last segment may indicate an inclusive range of document parts by using two citation values separated by double-periods, e.g.
usc/1/2/a..dis equivalent to "1 U.S.C. 2(a) through 1 U.S.C 2(d)".
etseqto indicate that the citation is to a note to the current section (e.g. "1 U.S.C. 2 note") or a reference to this and the following sections (e.g. "1 U.S.C. 2 et seq."). If there is no special citation this segment is omitted.
U.S.C. Chapter. Segments are:
etseq, as with U.S.C. Section citations.
U.S.C. Appendix. A citation to an appendix of a title of the U.S. Code and optionally to a section, e.g. "1 U.S.C App. 234"
etseq, as with U.S.C. Section citations.
A reference to a page in a volume of the Statutes at Large. The normal
citation "90 Stat. 2541" would be expressed as
2541..2543indicates pages 2541 through 2543.
A reference to an act by its popular name. There is very little
uniformity among act citations so machine-parsable act citation values
utilize a system of prefixes to indicate segment types. The normal
citation "1861(s)(2) of the Social Security Act" would be expressed as
Act/s:1861/ss:s/p:2. Segments are:
Further optional segments are citations reflecting the parts of the document explicitly mentioned by the text of the citation:
t:Icites "title one".
The following prefixes are defined:
The last segment citation value may use a double-period to
indicate a range. For example,
t:I..V indicates title 1
through title 5. Only the last segment citation value may use
a range because the citation would be ambiguous otherwise.
Social Security Act/t:I..V/s:6 is ambiguous, as
it is not clear which section six is indicated.
The final segment may contain the special value
as with U.S.C. Section citations.
A reference to a Public Law. The normal citation "P.L. 111-12" would
be expressed as
public-law/111/12. Segments are:
public-law/111/12/t:Iindicates "title I of P. L. 111-12".
Certain elements in HouseXML can express the same information as a CatoXML element. If a HouseXML element is present in a document and would express the same information as a CatoXML element, no CatoXML element is added. This section defines rules for determining the semantically equivalent CatoXML for a HouseXML element.
|Act (Popular Name)||
|U.S. Code Section||
|U.S. Code Chapter||
|U.S. Code Appendix||
|Statutes at Large||
is ignored because the vocabulary is unpublished. If it is ever released, its
value may be used in a
Entity lookup tables are references for entities indexed by entity-id. They have the following structure shared by all entity types:
entities root element.
entitieshas an required
@typeattribute expressing the entity type of all child elements. The value of this attribute matches the
@entity-typeattribute used on
entitieshas a required
@updatedattribute indicating the date and time the entity lookup table was last updated in iso8601 format, e.g.
@versionwhose value is entity-type specific. This attribute is used to fix a lookup table to a specific point in time relevant to a specific set of documents. For example, the list of agencies and bureaus (federal-body) may vary from year to year as some are added, others removed, and bureaus are restructured into different agencies. However, these older lists are still relevant, as legislation and other documents from those time periods will still need to identify them. Thus a
@versionattribute may be included with (for example) a fiscal year or congress number to indicate that it lists the state of the world of federal-bodies during that period. This is different from a lookup table with a newer
@updatedvalue: in this case the older document should merely be discarded. In other words, a lookup table is “updated” when it is corrected or added to, but “versioned” when the world changes in a backwards-incompatible way but the older lookup table needs to be kept for older documents.
entity child elements of
entities contain information regarding a
particular entity. They have a basic structure shared by all
entity types which may be extended by particular entity types.
@idattribute which indicates the id of the entity.
@parent-idattribute which refers to another entity in the table as its parent. The precise semantic meaning of this "parent" relationship varies by entity-type. Some entity types do not have parent-child relationships among entities.
abbrelements to indicate names and abbreviations for the entity. The value of this element is contained as text.
abbrelements have an optional
@roleattribute to indicate the role of the name. Predefined values are:
officialfor official names and abbreviations.
historicalfor older names and abbreviations no longer in common use.
Name and abbr sorting order. The order of preference for an entity’s names and abbreviations is determined in this way:
@roleattribute with value
officialrank first. If there are multiple such names or abbreviations, they are ranked in document order.
Certain entity types make use of the various extension points provided by the lookup table format and described in the previous section. These entity-type specific extensions are documented below.
These committee and subcommittee id values are consistent with those
found in the
@committee-id attribute of the
committee-name element of
Subcommittees indicate their parent Committee with the
@id values are Bioguide ids.
@version attribute on the
entity element indicates a congressional
session. The lookup table is expected to contain a comprehensive list
of every congressman who served during that session of congress.
entity element may have the following additional attributes:
@govtrackidto indicate a govtrack id
Rep.to indicate a representative,
Sen.to indicate a senator,
Del.to indicate a delegate.
@stateand a two-leter postal state to indicate the state of the seat the congressman occupies.
@districtto indicate the district number of the seat the Representative occupies.
name element includes a full name of the senator, with title,
party, and state. E.g.:
Rep. Gary Ackerman (D, NY-5).
name element may have the following optional attributes:
@firstnameto indicate the first name of the congressman.
@lastnameto indicate the last name of the congressman.
@entity element may have the following additional attributes:
@omb-agencya crosswalk to the three-digit Office of Management and Budget (OMB) agency code
@omb-bureaua crosswalk to the two-digit OMB bureau code
@treasury-codea crosswalk to the two-digit Treasury Account Symbol (TAS) code.
@role attribute of the
name element may have the value
leadership, which indicates that the name is the position of the senior
director of the named federal body. This role is included because bills often
direct an agency to do something using language that names the highest
position in that agency. For example, "The Happiness Czar shall expend $5
million in fiscal year 2013 to promote happiness abroad". Here, "Happiness
Czar" would be a
<name role="leadership"> entry for the fictional "Bureau