aboutsummaryrefslogtreecommitdiff
path: root/pipeline/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'pipeline/README.md')
-rw-r--r--pipeline/README.md52
1 files changed, 52 insertions, 0 deletions
diff --git a/pipeline/README.md b/pipeline/README.md
new file mode 100644
index 0000000..052ea9d
--- /dev/null
+++ b/pipeline/README.md
@@ -0,0 +1,52 @@
+pipeline
+========
+
+This directory contains tools and scripts for running a cron job that does
+RAPPOR analysis and generates an HTML dashboard.
+
+It works like this:
+
+1. `task_spec.py` generates a text file where each line corresponds to a process
+ to be run (a "task"). The process is `bin/decode-dist` or
+ `bin/decode-assoc`. The line contains the task parameters.
+
+2. `xargs -P` is used to run processes in parallel. Our analysis is generally
+ single-threaded (i.e. because R is single-threaded), so this helps utilize
+ the machine fully. Each task places its output in a different subdirectory.
+
+3. `cook.sh` calls `combine_results.py` to combine analysis results into a time
+ series. It also calls `combine_status.py` to keep track of task data for
+ "meta-analysis". `metric_status.R` generates more summary CSV files.
+
+4. `ui.sh` calls `csv_to_html.py` to generate an HTML fragments from the CSV
+ files.
+
+5. The JavaScript in `ui/ui.js` is loaded from static HTML, and makes AJAX calls
+ to retrieve the HTML fragments. The page is made interactive with
+ `ui/table-lib.js`.
+
+`dist.sh` and `assoc.sh` contain functions which coordinate this process.
+
+`alarm-lib.sh` is used to kill processes that have been running for too long.
+
+Testing
+-------
+
+`pipeline/regtest.sh` contains end-to-end demos of this process. Right now it
+depends on testdata from elsewhere in the tree:
+
+
+ rappor$ ./demo.sh run # prepare dist testdata
+ rappor$ cd bin
+
+ bin$ ./test.sh write-assoc-testdata # prepare assoc testdata
+ bin$ cd ../pipeline
+
+ pipeline$ ./regtest.sh dist
+ pipeline$ ./regtest.sh assoc
+
+ pipeline$ python -m SimpleHTTPServer # start a static web server
+
+ http://localhost:8000/_tmp/
+
+