The Fraidycat spec as I understand it

6 January 2023

Table of Contents

Kicks Condor’s Fraidycat is something that I’ve been working on porting to a self-hosted, headless Rust application for a while now. One of the things that makes this interestingRead “difficult." is its very type-fluid and apparently undocumented JSON format in which it stores parsing rules.

In order to determine how best to implement this, I need to write a specification for the format such as I understand it from reading the source. So here that is.

Resolver

A Resolver is a set of rules for parsing a document into an RSS feed. Here is Fraidycat’s Atom Resolver:

  "atom": {
    "namespaces": {
      "atom": "http://www.w3.org/2005/Atom",
      "fc": "http://fraidyc.at/ext/1.0",
      "media": "http://search.yahoo.com/mrss/"
    },
    "rules": {
      "entry": [
        {"var": "out:title", "op": "./atom:title/text()"},
        {"var": "out:html", "op": ["./atom:content[@type='html']", "./atom:content[@type='xhtml']"]},
        {"var": "out:url", "op": ["./atom:link[@type='text/html' and @rel='alternate'][1]/@href",
          "./atom:link[@rel='alternate' or @type='text/html'][1]/@href",
          "./atom:link[not(@rel)][1]/@href", "./atom:id/text()"], "mod": ["url"]},
        {"var": "out:graphic:thumb", "op": "./media:group/media:thumbnail[1]/@url"},
        {"var": "out:publishedAt", "op": ["./atom:published/text()", "./atom:updated/text()"], "mod": ["date"]},
        {"var": "out:updatedAt", "op": "./atom:updated/text()", "mod": ["date"]}
      ]
    },
    "acceptXml": [
      {"op": "//atom:feed", "acceptXml": [
        {"var": "out:title", "op": "./atom:title/text()"},
        {"var": "out:photos:avatar", "op": ["./atom:logo/text()", "./atom:link[@rel='avatar']/@href"], "mod": ["url"]},
        {"var": "out:description", "op": "./atom:subtitle/text()"},
        {"var": "out:url", "op": ["./atom:link[@type='text/html' and @rel='alternate'][1]/@href",
          "./atom:link[@rel='alternate' or @type='text/html'][1]/@href",
          "./atom:link[not(@rel)][1]/@href"],
          "mod": ["url"]},
        {"op": "./fc:status", "var": "out:status", "acceptHtml": [
          {"var": "out:label", "op": "@label"},
          {"var": "out:type", "op": "@type"},
          {"rule": "entry"}
        ]},
        {"op": "./atom:entry", "var": "out:posts", "acceptXml": [
          {"rule": "entry"}
        ]}
      ]}
    ]
  }
Element Description
acceptText, acceptHtml, acceptXml, acceptJson The script (sequence of Rules) with which to parse the document. A Resolver must either implement exactly one of these or implement accept; it cannot do both.
accept A list of key names of Resolvers to attempt to divert to, in order. Resolvers implementing accept cannot implement any preprocessing fields (S: describe these.) They may post-process the result via patch.
arguments An array of local variables to extract from the user-provided URL. Must be specified with match. Correspond, in order, to capturing groups in the match regex. “0” is used as the first element of the array to represent the unneeded global match \0.
depends A list of key names of Resolvers to run before this one. Any local variables specified with arguments will be respected in the dependency’s execution environment.
match A regex to match the URL against. Resolution will proceed only on a successful match. Capturing groups may be stored in variables using arguments.
namespaces A mapping from XML namespaces to namespace URIs.
patch A script (list of Rules) to run on the result of the Resolver diverted to with accept. Must be specified with accept.
render ???????
request A list of headers. (S: What schema does this correspond to?) ???
rules A collection of named scripts (lists of Rules.)
url A URL to fetch the data from instead of the user-provided one, for instance to access an API instead of scraping a page. Respects local variables. Must be specified with match.

Rule

A Rule is a mapping from one or more nodes in the input document to a node in the output document.

Element Description
op A single selector or list of selectors to match against the input document.
var A single path in the output document at which to store the result.
acceptXml, acceptHtml, acceptJson, acceptText A script (sequence of Rules) to run on the result. accept is illegal.
rule The name of a script (list of Rules) to divert to. Cannot be specified with any other field.
mod A list of post-processing functions to run, in order. (S: document these.)