7. Setting up your own repository

Basically all you need for your very own comic-get repository is a place to put your files online. Technically comic-get repository consists of four files plus rules.

7.1. Basic files

7.1.1. identifier

identifier contains globally unique identifier for your repository. Identifier is used to distinguish your repository from other repositories possibly in use. If you are using registered domain name to host the files, it is probably a good idea to use repository URI as identifier. Author prefers to omit protocol descriptor but this is up to you.

7.1.2. Packages

Packages contains the actual package information. This information is usually extracted from rulefiles, but can also be manually entered. It is recommended to use update.sh to create this file.

7.1.3. md5sums

md5sums contains md5sums of Packages and rules.md5sums. When comic-get updates the repository information, in most cases only this file is retrieved. If md5sums at remote host do not match local files, appropriate files are updated. This file is usually created by update.sh.

7.1.4. rules.md5sums

rules.md5sums is like md5sums. It contains md5sums of rulefiles. When rulefile md5sums do not match local copies, rulefiles are updated. This file is also created by update.sh.

7.2. Rulefiles

Rulefile consists of two parts: first there are the actual rules to fetch the target, and then there are metadata for the package such as language.

The basic format of rules goes as follows:


title Mutts
uri http://seattlepi.nwsource.com/fun/mutts.asp
fetch
image gif /content/Mutts\?date=\d+
fetch gif
save mutts.png convert:gif
	

This particular ruleset defined a URI to http://seattlepi.nwsource.com/fun/mutts.asp (uri) and fetched the content (fetch). image is then used to find a image with filename matching regular expression /content/Mutts\?date=\d+. This image is tagged with gif. Then the first image with tag gif is fetched with fetch. Finally a file named mutts.png is written using fetched data, and target with tag gif is converted using convert.

7.2.1. Rulefile commands

7.2.1.1. title

title defined a title for a set of rules. Currently this information is only used in progress indicator.

7.2.1.2. uri

Push new URI to URI list.

7.2.1.3. fetch

Fetch target. If no extra parameters are given, the first URI in the URI list is used. If one or more parameters are given, the first URI with given tag(s) is fetched. Parameters (tags) are given in descending order of preference.

7.2.1.4. image

Find image from fetched data (usually HTML). image can take one or more parameters. If image is given two parameters, first parameter is tag for the image and the second one is regular expressions. If only one parameter is given, tag is assumed empty. Tag is later used with fetch and save. If more than two parameters are given, the first parameter is used as tag (as explained before), the second one as regular expression, and the remaining parameters are used to limit matches to HTML-elements that include given string(s). For example,


image jpeg p\d+.png class="latest"
	
will match

<img src="p123.png" class="latest" />
	
but not

<img src="p122.png" />
	

7.2.1.5. link

Find link from fetched data (usually HTML). link takes one or more parameters. First parameter is always regexp to find link while remaining parameters can be used limit matches to HTML-elements that include given string. For example how to use extra parameters to limit matches, see image.

7.2.1.6. save

Save data. save takes one or more arguments. First argument is the filename to use when saving the data. Following arguments are optional, and currently only convert is recognised. By default if options need arguments, arguments are separated from option name with a colon, and with comma from each other.

convert can be used to use convert retrieved data. This is particulary useful if more than one image is defined with image and they are of different format. For example if the comic strip is distributed in png format, but occasionally in jpeg, that part of the ruleset could be as follows:


image png /stuff/\d+\.png
image jpeg /stuff/j\d+\.jpeg
fetch png jpeg
save foo.png convert:jpeg
	
This would search data for images and tag them with png and jpeg. Then comic-get would try to fetch png. If downloading png would fail, jpeg would be downloaded. Finally data would be written to foo.png, and if the data would be from image with tag jpeg, it would be converted into png before writing.

7.2.1.7. select

Select one of matches given by image or such. select takes exactly one argument which indicates index number of selected URI. Index numbers start from one (1), not zero (0), and negative index numbers are considered to refer to reverse list, i.e. -2 matches the second last match.

7.2.2. Package metadata

Package metadata is separated from rules with line nothing but --info-- on it. Generally speaking metadata fields begin with field name, followed by colon and whitespace.

7.2.2.1. Package

Package defined package name. This must match the base filename of the rulefile. Package name should not contain underscores, and it is recommended to use only lowercase characters (a-z), numers (0-9) and dash (-).

7.2.2.2. Version

Defines version of the ruleset. Currently not used anywhere.

7.2.2.3. Compatibility

Defines ruleset compatibility. Current version of comic-get can only parse level-2 files. Currently this field is not used, but expect this to change in future releases.

7.2.2.4. Maintainer

Define maintainer for the package. Can contain anything, but "Forename Surname <email@host.net> is recommended. (TODO: which standard?) Not used anywhere.

7.2.2.5. Language

Defines comic strip language. Must be xx_XX (TODO: which standard?) and must match language part of the rulefile.

7.2.2.6. Author

Defines (original) author of the comic. Used mostly for searching packages.

7.2.2.7. Source

Defines website from which the strip is fetched. Not used.

7.2.2.8. Description

Field different from others. Description should be followed with short description of the package (usually just the name). No other fields can follow Description. Lines after Description are treated as long (multiline) description for the strip. Each line must begin with a single space, and empty lines must have " ." (space-dot). Long description should be used to give out extra information about the comic strip which may help user to decide whether or not to try it.