Skip to content

Latest commit

 

History

History
574 lines (465 loc) · 26.8 KB

File metadata and controls

574 lines (465 loc) · 26.8 KB

Advanced topics

← Go back to Post-processing | ↑ Go to the Table of Content ↑ | Continue to Frequently Asked Questions →

Plugging the QC to an existing DPL workflow

Your existing DPL workflow can simply be considered a publisher. Therefore, replace o2-qc-run-producer with your own workflow.

For example, if TPC wants to monitor the output {"TPC", "CLUSTERS"} of the workflow o2-qc-run-tpcpid, modify the config file to point to the correct data and do :

o2-qc-run-tpcpid | o2-qc --config json://${QUALITYCONTROL_ROOT}/etc/tpcQCPID.json

Multi-node setups

During the data-taking Quality Control runs on a distributed computing system. Some QC Tasks are executed on dedicated QC servers, while others run on FLPs and EPNs. In the first case, messages coming from Data Sampling should reach QC servers where they are processed. In the latter case, locally produced Monitor Objects should be merged on QC servers and then have Checks run on them. By remote QC tasks we mean those which run on QC servers (remote machines), while local QC Tasks run on FLPs and EPNs (local machines).

While it is responsibility of the run operators to run all the processing topologies during the data taking, here we show how to achieve such multinode workflows on development setups, running them just with DPL driver. Note that for now we support cases with one or more local machines, but just only one remote machine.

In our example, we assume having two local processing nodes (localnode1, localnode2) and one QC node (qcnode). There are two types of QC Tasks declared:

  • MultiNodeLocal which are executed on the local nodes and their results are merged and checked on the QC server.
  • MultiNodeRemote which runs on the QC server, receiving a small percent of data from localnode2 only. Mergers are not needed in this case, but there is a process running Checks against Monitor Objects generated by this Task.

We use the SkeletonTask class for both, but any Task can be used of course. Should a Task be local, all its MonitorObjects need to be mergeable - they should be one of the mergeable ROOT types (histograms, TTrees) or inherit MergeInterface.

These are the steps to follow to get a multinode setup:

  1. Prepare a configuration file.

In this example we will use the Framework/multiNode.json config file. A config file should look almost like the usual one, but with a few additional parameters. In case of a local task, these parameters should be added:

    "tasks": {
      "MultiNodeLocal": {
        "active": "true",
        ...
        "location": "local",
        "localMachines": [
          "localnode1",
          "localnode2"
        ],
        "remoteMachine": "qcnode",
        "remotePort": "30132"
      }
    },

List the local processing machines in the localMachines array. remoteMachine should contain the host name which will serve as a QC server and remotePort should be a port number on which Mergers will wait for upcoming MOs. Make sure it is not used by other service. If different QC Tasks are run in parallel, use separate ports for each.

In case of a remote task, choosing "remote" option for the "location" parameter is enough.

    "tasks": {
      ...
      "MultiNodeRemote": {
        "active": "true",
        ...
        "dataSource": {
          "type": "dataSamplingPolicy",
          "name": "rnd-little"
        },
        "taskParameters": {},
        "location": "remote"
      }
    }

However in both cases, one has to specify the machines where data should be sampled, as below. If data should be published to external machines (with remote tasks), one has to add a local port number. Use separate ports for each Data Sampling Policy.

{
  "dataSamplingPolicies": [
    ...
    {
      "id": "rnd-little",
      "active": "true",
      "machines": [
        "localnode2"
      ],
      "port": "30333"
      ...
    }
  ]
}
  1. Make sure that the firewalls are properly configured. If your machines block incoming/outgoing connections by default, you can add these rules to the firewall (run as sudo). Consider enabling only concrete ports or a small range of those.
# localnode1 and localnode2 :
iptables -I INPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -s qcnode -j ACCEPT
iptables -I OUTPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -d qcnode -j ACCEPT
# qcnode:
iptables -I INPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -s localnode1 -j ACCEPT
iptables -I OUTPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -d localnode1 -j ACCEPT
iptables -I INPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -s localnode2 -j ACCEPT
iptables -I OUTPUT -p tcp -m conntrack --ctstate NEW,ESTABLISHED -d localnode2 -j ACCEPT
  1. Install the same version of the QC software on each of these nodes. We cannot guarantee that different QC versions will talk to each other without problems. Also, make sure the configuration file that you will use is the same everywhere.

  2. Run each part of the workflow. In this example o2-qc-run-producer represents any DPL workflow, here it is just a process which produces some random data.

# On localnode1:
o2-qc-run-producer | o2-qc --config json:/${QUALITYCONTROL_ROOT}/etc/multiNode.json --local --host localnode1 -b
# On localnode2:
o2-qc-run-producer | o2-qc --config json:/${QUALITYCONTROL_ROOT}/etc/multiNode.json --local --host localnode2 -b
# On qcnode:
o2-qc --config json:/${QUALITYCONTROL_ROOT}/etc/multiNode.json --remote

If there are no problems, on QCG you should see the example histogram updated under the paths qc/TST/MultiNodeLocal and qc/TST/MultiNodeRemote, and corresponding Checks under the path qc/checks/TST/.

Writing a DPL data producer

For your convenience, and although it does not lie within the QC scope, we would like to document how to write a simple data producer in the DPL. The DPL documentation can be found here and for questions please head to the forum.

As an example we take the DataProducerExample that you can find in the QC repository. It is produces a number. By default it will be 1s but one can specify with the parameter my-param a different number. It is made of 3 files :

  • runDataProducerExample.cxx : This is an executable with a basic data producer in the Data Processing Layer. There are 2 important functions here :
    • customize(...) to add parameters to the executable. Note that it must be written before the includes for the dataProcessing.
    • defineDataProcessing(...) to define the workflow to be ran, in our case the device(s) publishing the number.
  • DataProducerExample.h : The key elements are :
    1. The include #include <Framework/DataProcessorSpec.h>
    2. The function getDataProducerExampleSpec(...) which must return a DataProcessorSpec i.e. the description of a device (name, inputs, outputs, algorithm)
    3. The function getDataProducerExampleAlgorithm which must return an AlgorithmSpec i.e. the actual algorithm that produces the data.
  • DataProducerExample.cxx : This is just the implementation of the header described just above. You will probably want to modify getDataProducerExampleSpec and the inner-most block of getDataProducerExampleAlgorithm. You might be taken aback by the look of this function, if you don't know what a lambda is just ignore it and write your code inside the accolades.

You will probably write it in your detector's O2 directory rather than in the QC repository.

Access conditions from the CCDB

The MonitorObjects generated by Quality Control are stored in a dedicated repository based on CCDB. The run conditions, on the other hand, are located in another, separate database. One can access these conditions inside a Task by a dedicated method of the TaskInterface, as below:

TObject* condition = TaskInterface::retrieveCondition("Path/to/condition");
if (condition) {
  LOG(INFO) << "Retrieved " << condition->ClassName();
  delete condition;
}

Make sure to declare a valid URL of CCDB in the config file. Keep in mind that it might be different from the CCDB instance used for storing QC objects.

{
  "qc": {
    "config": {
     ...
      "conditionDB": {
        "url": "ccdb-test.cern.ch:8080"
      }
    },
    ...

Definition and access of task-specific configuration

A task can access custom parameters declared in the configuration file at qc.tasks.<task_name>.taskParameters. They are stored inside a key-value map named mCustomParameters, which is a protected member of TaskInterface.

One can also tell the DPL driver to accept new arguments. This is done using the customize method at the top of your workflow definition (usually called "runXXX" in the QC).

For example, to add two parameters of different types do :

void customize(std::vector<ConfigParamSpec>& workflowOptions)
{
  workflowOptions.push_back(
    ConfigParamSpec{ "config-path", VariantType::String, "", { "Path to the config file. Overwrite the default paths. Do not use with no-data-sampling." } });
  workflowOptions.push_back(
    ConfigParamSpec{ "no-data-sampling", VariantType::Bool, false, { "Skips data sampling, connects directly the task to the producer." } });
}

Custom QC object metadata

One can add custom metadata on the QC objects produced in a QC task. Simply call ObjectsManager::addMetadata(...), like in

  // add a metadata on histogram mHistogram, key is "custom" and value "34"
  getObjectsManager()->addMetadata(mHistogram->GetName(), "custom", "34");

This metadata will end up in the CCDB.

Data Inspector

This is a GUI to inspect the data coming out of the DataSampling, in particular the Readout.

alt text

Prerequisite

If not already done, install GLFW for your platform. On CC7 install glfw-devel from epel repository : sudo yum install glfw-devel --enablerepo=epel

Compilation

Build the QualityControl as usual.

Execution

To monitor the readout, 3 processes have to be started : the Readout, the Data Sampling and the Data Inspector.

First make sure that the Data Sampling is enabled in the readout :

[consumer-data-sampling]
consumerType=DataSampling
enabled=1

In 3 separate terminals, do respectively

  1. readout.exe file:///absolute/path/to/config.cfg
  2. o2-qc-run-readout-for-data-dump --batch
  3. o2-qc-data-dump --mq-config $QUALITYCONTROL_ROOT/etc/dataDump.json --id dataDump --control static

Configuration

Fraction of data The Data Sampling tries to take 100% of the events by default. Edit $QUALITYCONTROL_ROOT/etc/readoutForDataDump.json to change it. Look for the parameter fraction that is set to 1.

Port The Data Sampling sends data to the GUI via the port 26525. If this port is not free, edit the config file $QUALITYCONTROL_ROOT/etc/readoutForDataDump.json and $QUALITYCONTROL_ROOT/etc/dataDump.json.

Details on the data storage format in the CCDB

Each MonitorObject is stored as a TFile in the CCDB. It is therefore possible to easily open it with ROOT when loaded with alienv. It also seamlessly supports class schema evolution.

The objects are stored at a path which is enforced by the qc framework : /qc/<detector name>/<task name>/object/name Note that the name of the object can contain slashes (/) in order to build a sub-tree visible in the GUI. The detector name and the taskname are set in the config file :

"tasks": {
  "QcTask": {       <---------- task name
    "active": "true",
    "className": "o2::quality_control_modules::skeleton::SkeletonTask",
    "moduleName": "QcSkeleton",
    "detectorName": "TST",       <---------- detector name

The quality is stored as a CCDB metadata of the object.

Data storage format before v0.14 and ROOT 6.18

Before September 2019, objects were serialized with TMessage and stored as blobs in the CCDB. The main drawback was the loss of the corresponding streamer infos leading to problems when the class evolved or when accessing the data outside the QC framework.

The QC framework is nevertheless backward compatible and can handle the old and the new storage system.

Local CCDB setup

Having a central ccdb for test (ccdb-test) is handy but also means that everyone can access, modify or delete the data. If you prefer to have a local instance of the CCDB, for example in your lab or on your development machine, follow these instructions.

  1. Download the local repository service from http://alimonitor.cern.ch/download/local.jar

  2. The service can simply be run with java -jar local.jar

It will start listening by default on port 8080. This can be changed either with the java parameter “tomcat.port” or with the environment variable “TOMCAT_PORT”. Similarly the default listening address is 127.0.0.1 and it can be changed with the java parameter “tomcat.address” or with the environment variable “TOMCAT_ADDRESS” to something else (for example ‘*’ to listen on all interfaces).

By default the local repository is located in /tmp/QC (or java.io.tmpdir/QC to be more precise). You can change this location in a similar way by setting the java parameter “file.repository.location” or the environment variable “FILE_REPOSITORY_LOCATION”.

The address of the CCDB will have to be updated in the Tasks config file.

At the moment, the description of the REST api can be found in this document : https://docs.google.com/presentation/d/1PJ0CVW7QHgnFzi0LELc06V82LFGPgmG3vsmmuurPnUg

Local QCG (QC GUI) setup

To install and run the QCG locally, and its fellow process tobject2json, please follow these instructions : https://github.com/AliceO2Group/WebUi/tree/dev/QualityControl#run-qcg-locally

Developing QC modules on a machine with FLP suite

To load a development library in a setup with FLP suite, specify its full path in the config file (e.g. /etc/flp.d/qc/readout.json):

    "tasks": {
      "QcTask": {
        "active": "true",
        "className": "o2::quality_control_modules::skeleton::SkeletonTask",
        "moduleName": "/home/myuser/alice/sw/BUILD/QualityControl-latest/QualityControl/libQcTstLibrary",
        ...

Make sure that:

  • The name "QcTask" stays the same, as changing it might break the workflow specification for AliECS
  • The library is compiled with the same QC, O2, ROOT and GCC version as the ones which are installed with the FLP suite. Especially, the task and check interfaces have to be identical.

Use MySQL as QC backend

WARNING. We do not actively support MySQL as QC database anymore. The interface might not work as expected anymore.

  1. Install the MySQL/MariaDB development package * CC7 : sudo yum install mariadb-server * Mac (or download the dmg from Oracle) : brew install mysql

  2. Rebuild the QualityControl (so that the mysql backend classes are compiled)

  3. Start and populate database :

    sudo systemctl start mariadb # for CC7, check for your specific OS
    alienv enter qcg/latest
    o2-qc-database-setup.sh
    

Configuration files details

The QC requires a number of configuration items. An example config file is provided in the repo under the name example-default.json. This is a quick reference for all the parameters.

Global configuration structure

This is the global structure of the configuration in QC.

{
  "qc": {
    "config": {

    },
    "tasks": {
      
    },
    "externalTasks": {
    
    },
    "checks": {
      
    },
    "postprocessing": {
      
    }
  },
  "dataSamplingPoliciesFile": "json:///path/to/data/sampling/config.json",
  "dataSamplingPolicies": [

  ]
}

There are four QC-related components:

  • "config" - contains global configuration of QC which apply to any component. It is required in any configuration file.
  • "tasks" - contains declarations of QC Tasks. It is mandatory for running topologies with Tasks and Checks.
  • "externalTasks" - contains declarations of external devices which sends objects to the QC to be checked and stored.
  • "checks" - contains declarations of QC Checks. It is mandatory for running topologies with Tasks and Checks.
  • "postprocessing" - contains declarations of PostProcessing Tasks. It is only needed only when Post-Processing is run.

The configuration file can also include a path to Data Sampling configuration ("dataSamplingPoliciesFile") or the list of Data Sampling Policies. Please refer to the Data Sampling documentation to find more information.

Common configuration

This is how a typical "config" structure looks like. Each configuration element is described with a relevant comment afterwards. The "": "<comment>", formatting is to keep the JSON structure valid. Please note that these comments should not be present in real configuration files.

{
  "qc": {
    "config": {
      "database": {                       "": "Configuration of a QC database (the place where QC results are stored).",
        "username": "qc_user",            "": "Username to log into a DB. Relevant only to the MySQL implementation.",
        "password": "qc_user",            "": "Password to log into a DB. Relevant only to the MySQL implementation.",
        "name": "quality_control",        "": "Name of a DB. Relevant only to the MySQL implementation.",
        "implementation": "CCDB",         "": "Implementation of a DB. It can be CCDB, or MySQL (deprecated).",
        "host": "ccdb-test.cern.ch:8080", "": "URL of a DB."
      },
      "Activity": {                       "": ["Configuration of a QC Activity (Run). This structure is subject to",
                                               "change or the values might come from other source (e.g. AliECS)." ],
        "number": "42",                   "": "Activity number.",
        "type": "2",                      "": "Arbitrary activity type."
      },
      "monitoring": {                     "": "Configuration of the Monitoring library.",
        "url": "infologger:///debug?qc",  "": ["URI to the Monitoring backend. Refer to the link below for more info:",
                                               "https://github.com/AliceO2Group/Monitoring#monitoring-instance"]
      },
      "consul": {                         "": "Configuration of the Consul library (used for Service Discovery).",
        "url": "http://consul-test.cern.ch:8500", "": "URL of the Consul backend"
      },
      "conditionDB": {                    "": ["Configuration of the Conditions and Calibration DataBase (CCDB).",
                                               "Do not mistake with the CCDB which is used as QC repository."],
        "url": "ccdb-test.cern.ch:8080",  "": "URL of a CCDB"
      }
    }
  }
}

QC Tasks configuration

Below the full QC Task configuration structure is described. Note that more than one task might be declared inside in the "tasks" path.

{
 "qc": {
   "tasks": {
     "QcTaskName": {                       "": "Name of the QC Task. Less than 14 character names are preferred.",
       "active": "true",                   "": "Activation flag. If not \"true\", the Task will not be created.",
       "className": "namespace::of::Task", "": "Class name of the QC Task with full namespace.",
       "moduleName": "QcSkeleton",         "": "Library name. It can be found in CMakeLists of the detector module.",
       "detectorName": "TST",              "": "3-letter code of the detector.",
       "cycleDurationSeconds": "10",       "": "Duration of one cycle (how often MonitorObjects are published).",
       "maxNumberCycles": "-1",            "": "Number of cycles to perform. Use -1 for infinite.",
       "dataSource": {                     "": "Data source of the QC Task.",
         "type": "dataSamplingPolicy",     "": "Type of the data source, \"dataSamplingPolicy\" or \"direct\".",
         "name": "tst-raw",                "": "Name of Data Sampling Policy. Only for \"dataSamplingPolicy\" source.",
         "query" : "raw:TST/RAWDATA/0",    "": "Query of the data source. Only for \"direct\" source."
       },
       "taskParameters": {                 "": "User Task parameters which are then accessible as a key-value map.",
         "myOwnKey": "myOwnValue",         "": "An example of a key and a value. Nested structures are not supported"
       },
       "location": "local",                "": ["Location of the QC Task, it can be local or remote. Needed only for",
                                                "multi-node setups, not respected in standalone development setups."],
       "localMachines": [                  "", "List of local machines where the QC task should run. Required only",
                                           "", "for multi-node setups.",
         "o2flp1",                         "", "Hostname of a local machine.",
         "o2flp2",                         "", "Hostname of a local machine."
       ],
       "remoteMachine": "o2qc1",           "": "Remote QC machine hostname. Required ony for multi-node setups.",
       "remotePort": "30432",              "": "Remote QC machine TCP port. Required ony for multi-node setups."
     }
   }
 }
}

QC Checks configuration

Below the full QC Checks configuration structure is described. Note that more than one check might be declared inside in the "checks" path. Please also refer to the Checks documentation for more details.

{
 "qc": {
   "checks": {
     "MeanIsAbove": {                "": "Name of the Check. Less than 12 character names are preferred.",
       "active": "true",             "": "Activation flag. If not \"true\", the Check will not be run.",
       "className": "ns::of::Check", "": "Class name of the QC Check with full namespace.",
       "moduleName": "QcCommon",     "": "Library name. It can be found in CMakeLists of the detector module.",
       "detectorName": "TST",        "": "3-letter code of the detector.",
       "policy": "OnAny",            "": ["Policy which determines when MOs should be checked. See the documentation",
                                          "of Checks for the list of available policies and their behaviour."],
       "dataSource": [{              "": "List of data source of the Check.",
         "type": "Task",             "": "Type of the data source, only \"Task\" up to this date", 
         "name": "myTask_1",         "": "Name of the Task",
         "MOs": [ "example" ],       "": ["List of MOs to be checked. Use \"all\" (not as a list) to check each MO ",
                                          "which is produced by the Task"]
       }],
       "checkParameters": {          "": "User Check parameters which are then accessible as a key-value map.",
         "myOwnKey": "myOwnValue",   "": "An example of a key and a value. Nested structures are not supported"
       }
     }
   }
 }
}

QC Post-processing configuration

Below the full QC Post-processing (PP) configuration structure is described. Note that more than one PP Task might be declared inside in the "postprocessing" path. Please also refer to the Post-processing documentation for more details.

{
  "qc": {
    "postprocessing": {
      "ExamplePostprocessing": {              "": "Name of the PP Task.",
        "active": "true",                     "": "Activation flag. If not \"true\", the PP Task will not be run.",
        "className": "namespace::of::PPTask", "": "Class name of the PP Task with full namespace.",
        "moduleName": "QcSkeleton",           "": "Library name. It can be found in CMakeLists of the detector module.",
        "detectorName": "TST",                "": "3-letter code of the detector.",
        "initTrigger": [                      "", "List of initialization triggers",
          "startofrun",                       "", "An example of an init trigger"
        ],
        "updateTrigger": [                    "", "List of update triggers",
          "10min",                            "", "An example of an update trigger"
        ],
        "stopTrigger": [                      "", "List of stop triggers",
          "endofrun",                         "", "An example of a stop trigger"
        ]
      }
    }
  }
}

External tasks configuration

Below the external task configuration structure is described. Note that more than one external task might be declared inside in the "externalTasks" path.

{
  "qc": {
    "externalTasks": {
      "External-1": {                       "": "Name of the task",
        "active": "true",                   "": "Activation flag. If not \"true\", the Task will not be created.",
        "query": "External-1:TST/HISTO/0",  "": "Query specifying where the objects to be checked and stored are coming from. Use the task name as binding."
      }
    }
  }
}

← Go back to Post-processing | ↑ Go to the Table of Content ↑ | Continue to Frequently Asked Questions →