Table of Contents
Kicks Condor’s Fraidycat is something that I’ve been working on porting to a self-hosted, headless Rust application for a while now. One of the things that makes this interestingRead “difficult." is its very type-fluid and apparently undocumented JSON format in which it stores parsing rules.
In order to determine how best to implement this, I need to write a specification for the format such as I understand it from reading the source. So here that is.
Resolver
A Resolver
is a set of rules for parsing a document into an RSS feed. Here is Fraidycat’s Atom Resolver
:
"atom": {
"namespaces": {
"atom": "http://www.w3.org/2005/Atom",
"fc": "http://fraidyc.at/ext/1.0",
"media": "http://search.yahoo.com/mrss/"
},
"rules": {
"entry": [
{"var": "out:title", "op": "./atom:title/text()"},
{"var": "out:html", "op": ["./atom:content[@type='html']", "./atom:content[@type='xhtml']"]},
{"var": "out:url", "op": ["./atom:link[@type='text/html' and @rel='alternate'][1]/@href",
"./atom:link[@rel='alternate' or @type='text/html'][1]/@href",
"./atom:link[not(@rel)][1]/@href", "./atom:id/text()"], "mod": ["url"]},
{"var": "out:graphic:thumb", "op": "./media:group/media:thumbnail[1]/@url"},
{"var": "out:publishedAt", "op": ["./atom:published/text()", "./atom:updated/text()"], "mod": ["date"]},
{"var": "out:updatedAt", "op": "./atom:updated/text()", "mod": ["date"]}
]
},
"acceptXml": [
{"op": "//atom:feed", "acceptXml": [
{"var": "out:title", "op": "./atom:title/text()"},
{"var": "out:photos:avatar", "op": ["./atom:logo/text()", "./atom:link[@rel='avatar']/@href"], "mod": ["url"]},
{"var": "out:description", "op": "./atom:subtitle/text()"},
{"var": "out:url", "op": ["./atom:link[@type='text/html' and @rel='alternate'][1]/@href",
"./atom:link[@rel='alternate' or @type='text/html'][1]/@href",
"./atom:link[not(@rel)][1]/@href"],
"mod": ["url"]},
{"op": "./fc:status", "var": "out:status", "acceptHtml": [
{"var": "out:label", "op": "@label"},
{"var": "out:type", "op": "@type"},
{"rule": "entry"}
]},
{"op": "./atom:entry", "var": "out:posts", "acceptXml": [
{"rule": "entry"}
]}
]}
]
}
Element | Description |
---|---|
acceptText , acceptHtml , acceptXml , acceptJson |
The script (sequence of Rules) with which to parse the document. A Resolver must either implement exactly one of these or implement accept ; it cannot do both. |
accept |
A list of key names of Resolver s to attempt to divert to, in order. Resolver s implementing accept cannot implement any preprocessing fields (S: describe these.) They may post-process the result via patch . |
arguments |
An array of local variables to extract from the user-provided URL. Must be specified with match . Correspond, in order, to capturing groups in the match regex. “0” is used as the first element of the array to represent the unneeded global match \0 . |
depends |
A list of key names of Resolver s to run before this one. Any local variables specified with arguments will be respected in the dependency’s execution environment. |
match |
A regex to match the URL against. Resolution will proceed only on a successful match. Capturing groups may be stored in variables using arguments . |
namespaces |
A mapping from XML namespaces to namespace URIs. |
patch |
A script (list of Rules) to run on the result of the Resolver diverted to with accept . Must be specified with accept . |
render |
??????? |
request |
A list of headers. (S: What schema does this correspond to?) ??? |
rules |
A collection of named scripts (lists of Rules.) |
url |
A URL to fetch the data from instead of the user-provided one, for instance to access an API instead of scraping a page. Respects local variables. Must be specified with match . |
Rule
A Rule
is a mapping from one or more nodes in the input document to a node in the output document.
Element | Description |
---|---|
op |
A single selector or list of selectors to match against the input document. |
var |
A single path in the output document at which to store the result. |
acceptXml , acceptHtml , acceptJson , acceptText |
A script (sequence of Rules) to run on the result. accept is illegal. |
rule |
The name of a script (list of Rules) to divert to. Cannot be specified with any other field. |
mod |
A list of post-processing functions to run, in order. (S: document these.) |