d13forme 修订了这个 Gist . 转到此修订
1 file changed, 378 insertions
README.md(文件已创建)
| @@ -0,0 +1,378 @@ | |||
| 1 | + | # Wayback CDX Server API - BETA # | |
| 2 | + | ||
| 3 | + | ##### Changelist | |
| 4 | + | ||
| 5 | + | * 2013-08-07 -- Add this changelist! Page size is now adjustable [Pagination API](#pagination-api) | |
| 6 | + | ||
| 7 | + | * 2013-08-07 -- Added support for [Counters](#counters) and [Field Order](#field-order). | |
| 8 | + | ||
| 9 | + | * 2013-08-03 -- Added support for [Collapsing](#collapsing) | |
| 10 | + | ||
| 11 | + | ||
| 12 | + | ##### Table of Contents | |
| 13 | + | ||
| 14 | + | #### [Intro and Usage](#intro-and-usage) | |
| 15 | + | ||
| 16 | + | * [Changelist](#changelist) | |
| 17 | + | ||
| 18 | + | * [Basic usage](#basic-usage) | |
| 19 | + | ||
| 20 | + | * [Url Match Scope](#url-match-scope) | |
| 21 | + | ||
| 22 | + | * [Output Format (JSON)](#output-format-json) | |
| 23 | + | ||
| 24 | + | * [Field Order](#field-order) | |
| 25 | + | ||
| 26 | + | * [Filtering](#filtering) | |
| 27 | + | ||
| 28 | + | * [Collapsing](#collapsing) | |
| 29 | + | ||
| 30 | + | * [Query Result Limits](#query-result-limits) | |
| 31 | + | ||
| 32 | + | #### [Advanced Usage](#advanced-usage) | |
| 33 | + | ||
| 34 | + | * [Closest Timestamp Match](#closest-timestamp-match) | |
| 35 | + | ||
| 36 | + | * [Resumption Key](#resumption) | |
| 37 | + | ||
| 38 | + | * [Resolve Revisits](#resolve-revisits) | |
| 39 | + | ||
| 40 | + | * [Counters](#counters) | |
| 41 | + | ||
| 42 | + | * [Duplicate Counter](#duplicate-counter) | |
| 43 | + | ||
| 44 | + | * [Skip Counter](#skip-counter) | |
| 45 | + | ||
| 46 | + | * [Pagination API](#pagination-api) | |
| 47 | + | ||
| 48 | + | * [Access Control](#access-control) | |
| 49 | + | ||
| 50 | + | ||
| 51 | + | ||
| 52 | + | ## Intro and Usage ## | |
| 53 | + | ||
| 54 | + | The `wayback-cdx-server` is a standalone HTTP servlet that serves the index that the `wayback` machine uses to lookup captures. | |
| 55 | + | ||
| 56 | + | The index format is known as 'cdx' and contains various fields representing the capture, usually | |
| 57 | + | sorted by url and date. | |
| 58 | + | http://archive.org/web/researcher/cdx_file_format.php | |
| 59 | + | ||
| 60 | + | The server responds to GET queries and returns either the plain text CDX data, or optionally a JSON array of the CDX. | |
| 61 | + | ||
| 62 | + | The CDX server is deployed as part of web.archive.org Wayback Machine and the usage below reference this deployment. | |
| 63 | + | ||
| 64 | + | However, the cdx server is freely available with the rest of the open-source wayback machine software in this repository. | |
| 65 | + | ||
| 66 | + | Further documentation will focus on configuration and deployment in other environments. | |
| 67 | + | ||
| 68 | + | Please contant us at wwm@archive.org for additional questions. | |
| 69 | + | ||
| 70 | + | ||
| 71 | + | ### Basic Usage ### | |
| 72 | + | ||
| 73 | + | The most simple query and the only required param for the CDX server is the **url** param | |
| 74 | + | ||
| 75 | + | * http://web.archive.org/cdx/search/cdx?url=archive.org | |
| 76 | + | ||
| 77 | + | The above query will return a portion of the index, one per row, for each 'capture' of the url "archive.org" | |
| 78 | + | that is available in the archive. | |
| 79 | + | ||
| 80 | + | The columns of each line are the fields of the cdx. | |
| 81 | + | At this time, the following cdx fields are publicly available: | |
| 82 | + | ||
| 83 | + | `["urlkey","timestamp","original","mimetype","statuscode","digest","length"]` | |
| 84 | + | ||
| 85 | + | It is possible to customize the [Field Order](#field-order) as well. | |
| 86 | + | ||
| 87 | + | The the **url=** value should be [url encoded](http://en.wikipedia.org/wiki/Percent-encoding) if the url itself contains a query. | |
| 88 | + | ||
| 89 | + | All other params are optional and are explained below. | |
| 90 | + | ||
| 91 | + | ||
| 92 | + | For doing large/bulk queries, the use of the [Pagination API](#pagination-api) is recommended. | |
| 93 | + | ||
| 94 | + | ||
| 95 | + | ### Url Match Scope ### | |
| 96 | + | ||
| 97 | + | The default behavior is to return matches for an exact url. However, the cdx server can also return results matching a certain | |
| 98 | + | prefix, a certain host or all subdomains by using the **matchType=** param. | |
| 99 | + | ||
| 100 | + | For example, if given the url: *archive.org/about/* and: | |
| 101 | + | ||
| 102 | + | * **matchType=exact** (default if omitted) will return results matching exactly *archive.org/about/* | |
| 103 | + | ||
| 104 | + | * **matchType=prefix** will return results for all results under the path *archive.org/about/* | |
| 105 | + | ||
| 106 | + | http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=prefix&limit=1000 | |
| 107 | + | ||
| 108 | + | * **matchType=host** will return results from host archive.org | |
| 109 | + | ||
| 110 | + | http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=host&limit=1000 | |
| 111 | + | ||
| 112 | + | * **matchType=domain** will return results from host archive.org and all subhosts *.archive.org | |
| 113 | + | ||
| 114 | + | http://web.archive.org/cdx/search/cdx?url=archive.org/about/&matchType=domain&limit=1000 | |
| 115 | + | ||
| 116 | + | ||
| 117 | + | The matchType may also be set implicitly by using wildcard '*' at end or beginning of the url: | |
| 118 | + | ||
| 119 | + | * If url is ends in '/\*', eg **url=archive.org/\*** the query is equivalent to **url=archive.org/&matchType=prefix** | |
| 120 | + | * if url starts with '\*.', eg **url=\*.archive.org/** the query is equivalent to **url=archive.org/&matchType=domain** | |
| 121 | + | ||
| 122 | + | (Note: The *domain* mode is only available if the CDX is in SURT-order format.) | |
| 123 | + | ||
| 124 | + | ||
| 125 | + | ### Output Format (JSON) ## | |
| 126 | + | ||
| 127 | + | * Output: **output=json** can be added to return results as JSON array. The JSON output currently also includes a first line which indicates the cdx format. | |
| 128 | + | ||
| 129 | + | Ex: http://web.archive.org/cdx/search/cdx?url=archive.org&output=json&limit=3 | |
| 130 | + | ``` | |
| 131 | + | [["urlkey","timestamp","original","mimetype","statuscode","digest","length"], | |
| 132 | + | ["org,archive)/", "19970126045828", "http://www.archive.org:80/", "text/html", "200", "Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY", "1415"], | |
| 133 | + | ["org,archive)/", "19971011050034", "http://www.archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1402"], | |
| 134 | + | ["org,archive)/", "19971211122953", "http://www.archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1405"]] | |
| 135 | + | ``` | |
| 136 | + | ||
| 137 | + | * By default, CDX server returns gzip encoded data for all queries. To turn this off, add the **gzip=false** param | |
| 138 | + | ||
| 139 | + | ### Field Order ### | |
| 140 | + | ||
| 141 | + | It is possible to customize the fields returned from the cdx server using the **fl=** param. | |
| 142 | + | Simply pass in a comma separated list of fields and only those fields will be returned: | |
| 143 | + | ||
| 144 | + | * The following returns only the timestamp and mimetype fields with the header `["timestamp","mimetype"]` http://web.archive.org/cdx/search/cdx?url=archive.org&fl=timestamp,mimetype&output=json | |
| 145 | + | ||
| 146 | + | * If omitted, all the available fields are returned by default. | |
| 147 | + | ||
| 148 | + | ||
| 149 | + | ### Filtering ### | |
| 150 | + | ||
| 151 | + | * Date Range: Results may be filtered by timestamp using **from=** and **to=** params. | |
| 152 | + | The ranges are inclusive and are specified in the same 1 to 14 digit format used for `wayback` captures: *yyyyMMddhhmmss* | |
| 153 | + | ||
| 154 | + | Ex: http://web.archive.org/cdx/search/cdx?url=archive.org&from=2010&to=2011 | |
| 155 | + | ||
| 156 | + | ||
| 157 | + | * Regex filtering: It is possible to filter on a specific field or the entire CDX line (which is space delimited). | |
| 158 | + | Filtering by specific field is often simpler. | |
| 159 | + | Any number of filter params of the following form may be specified: **filter=**[!]*field*:*regex* may be specified. | |
| 160 | + | ||
| 161 | + | * *field* is one of the named cdx fields (listed in the JSON query) or an index of the field. It is often useful to filter by | |
| 162 | + | *mimetype* or *statuscode* | |
| 163 | + | ||
| 164 | + | * Optional: *!* before the query inverts the match, that is, will return results that do NOT match the regex. | |
| 165 | + | ||
| 166 | + | * *regex* is any standard Java regex pattern (http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html) | |
| 167 | + | ||
| 168 | + | ||
| 169 | + | * Ex: Query for 2 capture results with a non-200 status code: | |
| 170 | + | ||
| 171 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&output=json&limit=2&filter=!statuscode:200 | |
| 172 | + | ||
| 173 | + | ||
| 174 | + | * Ex: Query for 10 capture results with a non-200 status code and non text/html mime type matching a specific digest: | |
| 175 | + | ||
| 176 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&output=json&limit=10&filter=!statuscode:200&filter=!mimetype:text/html&filter=digest:2WAXX5NUWNNCS2BDKCO5OVDQBJVNKIVV | |
| 177 | + | ||
| 178 | + | ### Collapsing ### | |
| 179 | + | ||
| 180 | + | A new form of filtering is the option to 'collapse' results based on a field, or a substring of a field. | |
| 181 | + | Collapsing is done on adjacent cdx lines where all captures after the first one that are duplicate are filtered out. | |
| 182 | + | This is useful for filtering out captures that are 'too dense' or when looking for unique captures. | |
| 183 | + | ||
| 184 | + | To use collapsing, add one or more **collapse=field** or **collapse=field:N** where N is the first N characters of *field* to test. | |
| 185 | + | ||
| 186 | + | * Ex: Only show at most 1 capture per hour (compare the first 10 digits of the timestamp field). Given 2 captures 20130226010000 and 20130226010800, since first 10 digits 2013022601 match, the 2nd capture will be filtered out. | |
| 187 | + | ||
| 188 | + | http://web.archive.org/cdx/search/cdx?url=google.com&collapse=timestamp:10 | |
| 189 | + | ||
| 190 | + | The calendar page at web.archive.org uses this filter by default: http://web.archive.org/web/*/archive.org | |
| 191 | + | ||
| 192 | + | ||
| 193 | + | * Ex: Only show unique captures by digest (note that only adjacent digest are collapsed, duplicates elsewhere in the cdx are not affected) | |
| 194 | + | ||
| 195 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&collapse=digest | |
| 196 | + | ||
| 197 | + | ||
| 198 | + | * Ex: Only show unique urls in a prefix query (filtering out captures except first capture of a given url). This is similar to the old prefix query in wayback (note: this query may be slow at the moment): | |
| 199 | + | ||
| 200 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&collapse=urlkey&matchType=prefix | |
| 201 | + | ||
| 202 | + | ||
| 203 | + | ### Query Result Limits ### | |
| 204 | + | ||
| 205 | + | As the CDX server may return millions or billions of record, it is often necessary to set limits on a single query for practical reasons. | |
| 206 | + | The CDX server provides several mechanisms, including ability to return the last N as well as first N results. | |
| 207 | + | ||
| 208 | + | * The CDX server config provides a setting for absolute maximum length returned from a single query (currently set to 150000 by default). | |
| 209 | + | ||
| 210 | + | * Set **limit=** *N* to return the first N results. | |
| 211 | + | ||
| 212 | + | * Set **limit=** *-N* to return the last N results. The query may be slow as it begins reading from the beginning of the search space and skips all but last N results. | |
| 213 | + | ||
| 214 | + | Ex: http://web.archive.org/cdx/search/cdx?url=archive.org&limit=-1 | |
| 215 | + | ||
| 216 | + | * *Advanced Option:* **fastLatest=true** may be set to return *some number* of latest results for an exact match and is faster than the standard last results search. The number of results is at least 1 so **limit=-1** implies this setting. The number of results may be greater >1 when a secondary index format (such as ZipNum) is used, but is not guaranteed to return any more than 1 result. Combining this setting with **limit=** will ensure that *no more* than N last results. | |
| 217 | + | ||
| 218 | + | Ex: This query will result in upto 5 of the latest (by date) query results: | |
| 219 | + | ||
| 220 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&fastLatest=true&limit=-5 | |
| 221 | + | ||
| 222 | + | * The **offset=** *M* param can be used in conjunction with limit to 'skip' the first M records. This allows for a simple way to scroll through the results. | |
| 223 | + | ||
| 224 | + | However, the offset/limit model does not scale well to large querties since the CDX server must read and skip through the number of results specified by | |
| 225 | + | **offset**, so the CDX server begins reading at the beginning every time. | |
| 226 | + | ||
| 227 | + | ||
| 228 | + | ## Advanced Usage | |
| 229 | + | ||
| 230 | + | The following features are for more specific/advanced usage of the CDX server. | |
| 231 | + | ||
| 232 | + | ||
| 233 | + | ### Resumption Key ### | |
| 234 | + | ||
| 235 | + | There is also a new method that allows for the CDX server to specify 'resumption key' that can be used to continue the query from the previous end. | |
| 236 | + | This allows breaking up a large query into smaller queries more efficiently. | |
| 237 | + | This can be achieved by using **showResumeKey=** and **resumeKey=** params | |
| 238 | + | ||
| 239 | + | * To show the resumption key add **showResumeKey=true** param. When set, the resume key will be printed only if the query has more results that have not be printed due to **limit=** (or max query limit) number of results reached. | |
| 240 | + | ||
| 241 | + | * After the end of the query, the *<resumption key>* will be printed on a seperate line or seperate JSON query. | |
| 242 | + | ||
| 243 | + | * Plain text example: http://web.archive.org/cdx/search/cdx?url=archive.org&limit=5&showResumeKey=true | |
| 244 | + | ||
| 245 | + | ``` | |
| 246 | + | org,archive)/ 19970126045828 http://www.archive.org:80/ text/html 200 Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY 1415 | |
| 247 | + | org,archive)/ 19971011050034 http://www.archive.org:80/ text/html 200 XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3 1402 | |
| 248 | + | org,archive)/ 19971211122953 http://www.archive.org:80/ text/html 200 XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3 1405 | |
| 249 | + | org,archive)/ 19971211122953 http://www.archive.org:80/ text/html 200 XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3 1405 | |
| 250 | + | org,archive)/ 19980109140106 http://archive.org:80/ text/html 200 XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3 1402 | |
| 251 | + | ||
| 252 | + | org%2Carchive%29%2F+19980109140106%21 | |
| 253 | + | ``` | |
| 254 | + | ||
| 255 | + | * JSON example: http://web.archive.org/cdx/search/cdx?url=archive.org&limit=5&showResumeKey=true&output=json | |
| 256 | + | ||
| 257 | + | ``` | |
| 258 | + | [["urlkey","timestamp","original","mimetype","statuscode","digest","length"], | |
| 259 | + | ["org,archive)/", "19970126045828", "http://www.archive.org:80/", "text/html", "200", "Q4YULN754FHV2U6Q5JUT6Q2P57WEWNNY", "1415"], | |
| 260 | + | ["org,archive)/", "19971011050034", "http://www.archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1402"], | |
| 261 | + | ["org,archive)/", "19971211122953", "http://www.archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1405"], | |
| 262 | + | ["org,archive)/", "19971211122953", "http://www.archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1405"], | |
| 263 | + | ["org,archive)/", "19980109140106", "http://archive.org:80/", "text/html", "200", "XAHDNHZ5P3GSSSNJ3DMEOJF7BMCCPZR3", "1402"], | |
| 264 | + | [], | |
| 265 | + | ["org%2Carchive%29%2F+19980109140106%21"]] | |
| 266 | + | ``` | |
| 267 | + | ||
| 268 | + | * In a subsequent query, adding **resumeKey=** *<resumption key>* will resume the search from the next result: | |
| 269 | + | No other params from the original query (such as *from=* or *url=*) need to be altered | |
| 270 | + | To continue from the previous example, the subsequent query would be: | |
| 271 | + | ||
| 272 | + | Ex: http://web.archive.org/cdx/search/cdx?url=archive.org&limit=5&showResumeKey=true&resumeKey=org%2Carchive%29%2F+19980109140106%21 | |
| 273 | + | ||
| 274 | + | ### Counters ### | |
| 275 | + | ||
| 276 | + | There is some work on custom counters to enchance the aggregation capabilities of CDX server. | |
| 277 | + | These features are brand new and should be considered experimental. | |
| 278 | + | ||
| 279 | + | #### Duplicate Counter #### | |
| 280 | + | ||
| 281 | + | While collapsing allows for filtering out adjacent results that are duplicates, it is also possible to track duplicates throughout the cdx | |
| 282 | + | using a special new extension. | |
| 283 | + | By adding the **showDupeCount=true** a new `dupecount` column will be added to the results. | |
| 284 | + | ||
| 285 | + | * The duplicates are determined by tracking rows with the same `digest` field. | |
| 286 | + | ||
| 287 | + | * The `warc/revisit` mimetype in duplicates > 0 will automatically be resolved to the mimetype of the original, if found. | |
| 288 | + | ||
| 289 | + | * Using **showDupeCount=true** will only show unique captures: http://web.archive.org/cdx/search/cdx?url=archive.org&showDupeCount=true&output=json&limit=50 | |
| 290 | + | ||
| 291 | + | ||
| 292 | + | #### Skip Counter #### | |
| 293 | + | ||
| 294 | + | It is possible to track how many CDX lines were skipped due to [Filtering](#filtering) and [Collapsing](#collapsing) | |
| 295 | + | by adding the special `skipcount` counter with **showSkipCount=true**. | |
| 296 | + | An optional `endtimestamp` count can also be used to print the timestamp of the last capture by adding **lastSkipTimestamp=true** | |
| 297 | + | ||
| 298 | + | * Ex: Collapse results by year and print number of additional captures skipped and timestamp of last capture: | |
| 299 | + | ||
| 300 | + | http://web.archive.org/cdx/search/cdx?url=archive.org&collapse=timestamp:4&output=json&showSkipCount=true&lastSkipTimestamp=true | |
| 301 | + | ||
| 302 | + | ||
| 303 | + | ### Pagination API ### | |
| 304 | + | ||
| 305 | + | The above resume key allows for sequential querying of CDX data. | |
| 306 | + | However, in some cases where very large querying is needed (for example domain query), it may be useful to perform queries | |
| 307 | + | in parallel and also estimate the total size of the query. | |
| 308 | + | ||
| 309 | + | `wayback` and `cdx-server` support a secondary loading from a 'zipnum' CDX index. | |
| 310 | + | This index contains CDX lines stored in concatenated GZIP blocks (usually 3,000 lines each) and a secondary index | |
| 311 | + | which provides binary search to the 'zipnum' blocks. | |
| 312 | + | By using the secondary index, it is possible to estimate the total size of a query and also break up the query in size. | |
| 313 | + | Using the zipnum format or other secondary index is needed to support pagination. | |
| 314 | + | ||
| 315 | + | However, pagination can only work on a single index at a time; merging input from multiple sources (plain cdx or zipnum) | |
| 316 | + | is not possible. As such, the results from a paginated query may be slightly less up-to-date than | |
| 317 | + | a default non-paginated query. | |
| 318 | + | ||
| 319 | + | * To use pagination, simply add the **page=i** param to the query to return the i-th page. If pagination is not supported, the CDX server will return a 400. | |
| 320 | + | ||
| 321 | + | * Pages are numbered from 0 to *num pages - 1*. If *i<0*, pages are not used. If *i>=num pages*, no results are returned. | |
| 322 | + | ||
| 323 | + | Ex: First page: http://web.archive.org/cdx/search/cdx?url=archive.org&page=0 | |
| 324 | + | ||
| 325 | + | Ex: Next Page: http://web.archive.org/cdx/search/cdx?url=archive.org&page=1 | |
| 326 | + | ||
| 327 | + | ||
| 328 | + | * To determine the number of pages, add the **showNumPages=true** param. This is a special query that will return a single number indicating the number of pages | |
| 329 | + | ||
| 330 | + | Ex: http://web.archive.org/cdx/search/cdx?url=archive.org&showNumPages=true | |
| 331 | + | ||
| 332 | + | * Page size is the number of zipnum blocks scanned per page (so a page size of `1` will contain *up to* 3,000 results per page). This means the number of results on each page will vary, because each block may have a different number of CDX lines matching your query. Page size is configured to an optimal value on the CDX server, and may be similar to max query limit in non-paged mode. The CDX server on archive.org currently has a page size of 50. | |
| 333 | + | ||
| 334 | + | * It is possible to adjust the page size to a smaller value than the default by setting the **pageSize=P** where 1 <= P <= default page size. | |
| 335 | + | ||
| 336 | + | Ex: Get # of pages with smallest page size: http://web.archive.org/cdx/search/cdx?url=archive.org&showNumPages=true&pageSize=1 | |
| 337 | + | ||
| 338 | + | Ex: Get first page with smallest page size: http://web.archive.org/cdx/search/cdx?url=archive.org&page=0&pageSize=1 | |
| 339 | + | ||
| 340 | + | ||
| 341 | + | * If there is only one page, adding the **page=0** param will return the same results as without setting a page. | |
| 342 | + | ||
| 343 | + | * It is also possible to have the CDX server return the raw secondary index, by specifying **showPagedIndex=true**. This query returns the secondary index instead of the CDX results and may be subject to access restrictions. | |
| 344 | + | ||
| 345 | + | * All other params, including the resumeKey= should work in conjunction with pagination. | |
| 346 | + | ||
| 347 | + | ||
| 348 | + | ||
| 349 | + | ### Access Control ### | |
| 350 | + | ||
| 351 | + | The cdx server is designed to improve access to archived data to a broad audience, but it may be necessary to restrict certain parts of the cdx. | |
| 352 | + | ||
| 353 | + | The cdx server provides greanting permissions to restricted data via an API key that is passed in as a cookie. | |
| 354 | + | ||
| 355 | + | Currently two restrictions/permission types are supported: | |
| 356 | + | ||
| 357 | + | * Access to certain urls which are considered private. When restricted, only public urls are included in query results and access to secondary index is restricted. | |
| 358 | + | ||
| 359 | + | * Access to certain fields, such as filename in the CDX. When restricted, the cdx results contain only public fields. | |
| 360 | + | ||
| 361 | + | ||
| 362 | + | To allow access, the API key cookie must be explicitly set on the client, eg: | |
| 363 | + | ||
| 364 | + | ``` | |
| 365 | + | curl -H "Cookie: cdx-auth-token=API-Key-Secret http://mycdxserver/search/cdx?url=..." | |
| 366 | + | ``` | |
| 367 | + | ||
| 368 | + | The *API-Key-Secret* can be set in the cdx server configuration. | |
| 369 | + | ||
| 370 | + | ||
| 371 | + | ## CDX Server Configuration ## | |
| 372 | + | ||
| 373 | + | ||
| 374 | + | TODO | |
| 375 | + | ||
| 376 | + | Start by editing the wayback-cdx-server-servlet.xml File in the WEB-INF Directory. Just put some valid CDX-Files in the cdxUris-List (Files must end with cdx or cdx.gz!) | |
| 377 | + | ||
| 378 | + | ||
    
    
                            
                            上一页
    
    
    下一页