diff --git a/docs/hyperloop/co2eestimates.md b/docs/hyperloop/co2eestimates.md index 47ddd9ea..cc6a0e36 100644 --- a/docs/hyperloop/co2eestimates.md +++ b/docs/hyperloop/co2eestimates.md @@ -6,28 +6,30 @@ title: CO2 equivalent estimates ## CO2 equivalent estimates
- + Co2eq tooltip

* In Hyperloop, an estimate of the CO2eq produced by your trains is shown in order to give an idea of the environmental impact. We need to run analysis to achieve our scientific goals but we can optimize the code and sometimes work efficiently even with less trains. The displayed value should help educate the decision if a train is needed. -The value is shown before trains have been run (an estimate based using the wagon test) and then when a train run has finished. +The value is shown before trains have been run (an estimate based using the wagon test) and then when a train run has finished. * The estimate is visible in: * Wagon test view and train submission view (an estimate based on the wagon test) * Train test view (an estimate based on the train test. So, this estimate accounts for all wagons in the train) - * In the 'General' tab of the train view, when a train is in the state 'Done'. This estimate directly uses the CPU usage of the train run. + * In the 'General' tab of the train view, when a train is in the state 'Done'. This estimate directly uses the CPU usage of the train run. ## How is it calculated? Principle: + * There are a few studies for estimating CO2eq of computing, and there are many factors to consider such as the efficiency of the machines used and the power grid which the machines are on. * Furthermore, grey energy (the energy to produce the machines) and the transfers of the data could be considered. * It would be difficult to give justice to all these details and check for each train where exactly the jobs were running. * Therefore, in Hyperloop, estimates of CO2eq are directly derived from CPU usage using an average conversion factor. Calculation input: -* Power per computing core: We use an optimistic value of 10 W/core for pure power consumption and assume a 50/50 split between the carbon from power and carbon from embodied / embedded emissions, therefore working effectively with 20 W/core. Some literature estimates up to 53 W/core for power alone in high-performance computing (see https://www.nature.com/articles/s41550-020-1169-1). -* Electricity emission factors (kgCO2eq per kWh) vary depending how electricity is produced. However, carbon is not everything and we do not want to enter a debate on how energy should be produced here. Therefore we use an average value from https://arxiv.org/pdf/2011.02839.pdf of 0.301 kgCO2eq / kWh, noting that even for the same country the estimates vary significantly depending on the source (e.g. comparing to https://arxiv.org/pdf/2101.02049). -* Hyperloop estimations do not account for the power consumption of data transfer, central infrastructure, or power for storage. The paper 'electricity intensity of internet data transmission', https://onlinelibrary.wiley.com/doi/pdf/10.1111/jiec.12630, estimates 0.06kWh/GB in data transfer. This would mean that a petabyte of data transfer needs 18 tCO2eq. Additionally, the carbon produced by storing the data would not be negligible. For simplicity, we do not account for these aspects, so that our estimates are more directly linked to individual train runs, and not the wider Grid infrastructure. + +* Power per computing core: We use an optimistic value of 10 W/core for pure power consumption and assume a 50/50 split between the carbon from power and carbon from embodied / embedded emissions, therefore working effectively with 20 W/core. Some literature estimates up to 53 W/core for power alone in high-performance computing (see ). +* Electricity emission factors (kgCO2eq per kWh) vary depending how electricity is produced. However, carbon is not everything and we do not want to enter a debate on how energy should be produced here. Therefore we use an average value from of 0.301 kgCO2eq / kWh, noting that even for the same country the estimates vary significantly depending on the source (e.g. comparing to ). +* Hyperloop estimations do not account for the power consumption of data transfer, central infrastructure, or power for storage. The paper 'electricity intensity of internet data transmission', , estimates 0.06kWh/GB in data transfer. This would mean that a petabyte of data transfer needs 18 tCO2eq. Additionally, the carbon produced by storing the data would not be negligible. For simplicity, we do not account for these aspects, so that our estimates are more directly linked to individual train runs, and not the wider Grid infrastructure. * At 20W per core, and 0.301 kgCO2eq / kWh, this gives us: **6t CO2eq per 1MCPUh** or **1 CPU year = 53.3 kgCO2eq** -* In order to compare these emissions to something we know, we use CO2eq produced by flights based on curb6.com \ No newline at end of file +* In order to compare these emissions to something we know, we use CO2eq produced by flights based on curb6.com diff --git a/docs/hyperloop/hyperlooppolicy.md b/docs/hyperloop/hyperlooppolicy.md index c863af4e..9b9b3b6b 100644 --- a/docs/hyperloop/hyperlooppolicy.md +++ b/docs/hyperloop/hyperlooppolicy.md @@ -7,38 +7,38 @@ title: Fair usage policy The very large amount of data that will be collected in Run 3 represents a challenge for analysis, for both the CPU needs and the read data from storage, and therefore a resource usage policy has been put in place to ensure proper use of computing resources. The policy has been openly discussed in multiple meetings, including ALICE weeks, and is subject to adjustments as necessary and as the collaboration gains experience with the Run 3 analysis. If you have questions or doubts, please first refer to your PWG convener who will then bring up the case with the analysis coordinator. -The image below summarizes the policy: +The image below summarizes the policy:
- +Screenshot of hyperlooppolicy
-In general, four categories of trains exist: +In general, four categories of trains exist: -* Trains below 30 TB and taking more than 2.0y of CPU time (red shaded area) are very strongly discouraged. In those cases, please resort to very small trains (where throughputs of even 100 KB/s are allowed with autosubmission) to run. +* Trains below 30 TB and taking more than 2.0y of CPU time (red shaded area) are very strongly discouraged. In those cases, please resort to very small trains (where throughputs of even 100 KB/s are allowed with autosubmission) to run. * Trains that are lower than 2y in CPU usage and loop over less than 200 TB are free to execute and can be executed on Hyperloop via autosubmission. In a certain region between 30-200 TB, slightly more than 2y in CPU time is allowed (see sketch). -* Trains that loop over more than 200 TB and less than 800 TB are dealt with as follows: - * if they require less than 10 years of CPU time, they need only PWG convener approval. - * if they require more than 10 years of CPU time but less than 200 years, they need Analysis and Physics Coordinator approval to run. - * if they require over 200 years of CPU, they need excplicit PB approval. -* Heavy trains looping over datasets bigger than 800 TB are dealt with as follows: - * if they require less than 20 years of CPU time, they need only PWG approval. - * if they require between 20 to 200y of CPU, they can be approved offline by Analysis and Physics Coordination. - * if they require over 200 years of CPU, they need explicit PB approval. +* Trains that loop over more than 200 TB and less than 800 TB are dealt with as follows: + * if they require less than 10 years of CPU time, they need only PWG convener approval. + * if they require more than 10 years of CPU time but less than 200 years, they need Analysis and Physics Coordinator approval to run. + * if they require over 200 years of CPU, they need excplicit PB approval. +* Heavy trains looping over datasets bigger than 800 TB are dealt with as follows: + * if they require less than 20 years of CPU time, they need only PWG approval. + * if they require between 20 to 200y of CPU, they can be approved offline by Analysis and Physics Coordination. + * if they require over 200 years of CPU, they need explicit PB approval. ## Implementation in Hyperloop datasets In practice the chart above is mapped on a number of distinct resource groups which determine the limits assigned to each dataset:
- +Screenshot of resourcetable
The smaller the dataset size, the more often it is automatically submitted per week and the more often you are allowed to run on it per week. Manual requests to datasets above 50 TB are only fulfilled at the automatical submission times defined. This is in order to allow grouping of wagons to large trains. ## Derived data -Derived datasets can be created on Hyperloop which are by construction much smaller than the original datasets. Those are advantagous because steps which are identical in each analysis train run (e.g. event selection and centrality calculation, secondary-vertex finding) are only executed once which saves CPU. Furthermore, as the size is smaller such trains cause less load on the storages. +Derived datasets can be created on Hyperloop which are by construction much smaller than the original datasets. Those are advantagous because steps which are identical in each analysis train run (e.g. event selection and centrality calculation, secondary-vertex finding) are only executed once which saves CPU. Furthermore, as the size is smaller such trains cause less load on the storages. As an example, you can imagine that you run a derived data train on a dataset of 500 TB where you need explicit approval. Say you have a reduction factor of 100, then your output derived data is about 5 TB. You will be allowed to run on that dataset much more frequent, see the table above. diff --git a/docs/hyperloop/legoexpert.md b/docs/hyperloop/legoexpert.md index 8e7ded5b..28c1c02b 100644 --- a/docs/hyperloop/legoexpert.md +++ b/docs/hyperloop/legoexpert.md @@ -17,7 +17,7 @@ title: For the Run 2 LEGO train expert. What has changed? * There is a history feature for wagons and datasets. You can access it by clicking on the button `πŸ“œ` available inside of a wagon/dataset view. A detailed view of what has been created/updated/removed from the wagon/dataset is shown, as well as the username and the time when the change was made.
- +Screenshot of dataset history
* There are automated notifications. These notifications are created per user, and display changes made to tools, like _Datasets_, that are being used by the user. They are displayed per _Analysis_ in the _My Analyses_ page, or globally in the button `πŸ””` which can be found on the top menu. @@ -31,5 +31,5 @@ title: For the Run 2 LEGO train expert. What has changed? * **Performance Graphs** page allows the user to upload his own local metrics file, and then generate the test graphs specific to that file. You produce a local _performanceMetrics.json_ by running the o2 workflow with the argument _--resources-monitoring 2_ which, in this example, produces monitoring information every 2 seconds. These are the same type of graphs produced in the _Test Graphs_ tab of the train run. This page can be accessed at: .
- +Screenshot of performance graphs
diff --git a/docs/hyperloop/notifications.md b/docs/hyperloop/notifications.md index 39315bc4..9b667695 100644 --- a/docs/hyperloop/notifications.md +++ b/docs/hyperloop/notifications.md @@ -9,13 +9,13 @@ title: Notifications * The notifications can be seen in the _My Analyses_ page and in the _Notifications_ page, by clicking `πŸ””` in the menu bar.
- + Screenshot of notifications in my analyses
* The user can click the `βœ–οΈ` button to remove a notification. In order to remove all the notifications, go to the Notifications page, and click the `❌`_Dismiss all_ button.
- + Screenshot of all notifications
## Dataset changed @@ -27,7 +27,7 @@ title: Notifications * The automatic composition settings have changed, e.g. the schedule
- + Screenshot of dataset changed
## Dataset activated / deactivated @@ -35,7 +35,7 @@ title: Notifications * Notifies the user when a dataset included in his or her analyses has been successfully activated or deactivated.
- + Screenshot of dataset activation
## Dataset production added or removed @@ -43,17 +43,17 @@ title: Notifications * For RUN 3 data and MC, the user is informed if the production has been successfully added to or removed from the dataset.
- + Screenshot of production added
* For RUN 2 data, the user is notified when a conversion train run has been added to or removed from the dataset.
- + Screenshot of trainrun added
- + Screenshot of trainrun removed
* For derived data, a notification is sent when a Hyperloop train that produced derived data has been added or removed. @@ -63,7 +63,7 @@ title: Notifications * The user is informed when a run has been added to or removed from the DPG runlist. This change is usually done by the DPG experts.
- + Screenshot of runlist updated
## Mergelist updated @@ -71,7 +71,7 @@ title: Notifications * The mergelist defines which runs are merged into one file at the end of the train running. The user is informed when a mergelist has been modified, added to or removed from the dataset production.
- + Screenshot of mergelist update
## Short datasets @@ -89,7 +89,7 @@ Informs the user when a wagon has been disabled in different circumstances: * Local tests are cleaned if the wagons are not submitted in a period of 4 weeks. The user is notified that the respective wagons are automatically disabled.
- + Screenshot of test cleaned
* When a wagon with derived data output is enabled, the test cannot start if the wagon and its dependencies share the same workflow. As a result, the wagon is disabled and the user is notified about the wagons which share the same task. @@ -97,7 +97,7 @@ Informs the user when a wagon has been disabled in different circumstances: * The notification format is: The wagon _"wagon_name"_ was disabled in _"dataset_name"_. There is derived data. The following wagons have the same workflows {_wagon1_, _wagon2_: _common_workflow_},...,{_wagonX_, _wagonY_: _common_workflow_}
- + Screenshot of wagon disabled 1
* If among the wagon and its dependencies there are identical derived data outputs, the test cannot start, and the wagon is disabled. @@ -105,25 +105,23 @@ Informs the user when a wagon has been disabled in different circumstances: * The notification format is: The wagon _"wagon_name"_ was disabled in _"dataset_name"_. The following wagons have the same derived data outputs {_wagon1_, _wagon2_: _common_derived_data_},...,{_wagonX_, _wagonY_: _common_derived_data_}
- + Screenshot of wagon disabled
- + * The wagon is disabled if the workflow name has been changed in the meantime. This is fixed by updating the workflow name in the wagon configuration.
- + Screenshot of notification workflow
* The wagon is disabled if one of the user defined dependencies of the wagon is considered identical to a service wagon. In order to most efficiently make use of the Grid and the analysis factilities, copies of core services are not permitted as it prevents combining several users into one train.
- + Screenshot of notification of identical wagon
A service wagon is considered identical to a user wagon if it shares the same activated output tables, the same workflow, and it has matching configurables. To fix this error, please use the listed service wagon as a dependency instead of the copy. - - ## Inconsistent parameters * Hyperloop makes a comparison between the wagon configuration and the configuration defined in O2 for the package tag selected for the wagon. If they do not coincide, the user will be informed about the mismatch. The comparison is case sensitive, therefore a Configurable will not match if its name does not contain the identical lowercase / uppercase combination. @@ -131,13 +129,13 @@ Informs the user when a wagon has been disabled in different circumstances: * The user is notified if there is a configurable present in the wagon configuration that is not defined in O2 for the selected package tag tag. Likewise, it informs the user when the wagon configuration misses one or more of the Configurables defined in O2 for the specific tag.
- + Screenshot of inconsistent parameters 2
* If the **wagon configuration is old**, and the wagon is enabled with the latest package tag, the user is advised to sync the wagon in order to get the present configuration. Following this, the test will start automatically. Likewise, the test is reset whenever there is a change in the database, such as updating or syncing the wagon configuration or its dependencies.
- + Screenshot of inconsistent parameters
* If the **wagon is enabled with an older tag**, the configuration might not match (hence the notification). If the old tag is needed, then syncing is not an option because this will set the package to the latest one. Therefore, the wagon configuration has to be modified as needed. The user can take as a reference _full_config.json_ in the test output, which shows the configuration the test is being run with, and compare it to the wagon configuration. diff --git a/docs/hyperloop/operatordocumentation.md b/docs/hyperloop/operatordocumentation.md index f1342620..badc8ddc 100644 --- a/docs/hyperloop/operatordocumentation.md +++ b/docs/hyperloop/operatordocumentation.md @@ -9,13 +9,13 @@ title: Operator Documentation * Below, a display of the grid jobs state during the previous week is displayed, for every site.
- +Screenshot of dashboard
* By default, the dashboard displays the last week summary on the lower section of the page. Use the interval selection tool to select the period of time that you are interested in: either select one from the left menu (e.g. last 3 months, last year), or choose the start and end date of the interval. Click **Save** to update the dashboard.
- +Screenshot of dashboard selection
* By clicking the number of wagons waiting to be included in a train, the user can directly open the [_Train Submission_](#trainsubmission). Similarly, a link to the [_Train Runs_](#train-runs) is available by clicking the number of trains to be submitted to the grid, the number of running tests, or the number of finished trains. @@ -24,32 +24,35 @@ title: Operator Documentation * For a user, the **Train Submission** page displays a read view only of datasets which have enabled wagons. * For a train operator, the _Train Submission_ page displays only datasets which have enabled wagons, and allows train composition, as well as submitting, modifying and killing a train. + ### Train Composition + * Trains are composed per dataset. Only wagons which have a test status of success `🌟` or warning `❗️` can be composed in a train. If a wagon has _Derived data_ tables activated, it will be signalized in the _Test status_ column with the icon πŸ—‚οΈ (standard derived data) or with green bordered πŸ—‚οΈ (slim derived data). The difference between standard and slim derived data will be explained [below](#deriveddatatypes). * By default, wagons that were enabled at most one week ago are shown. In order to display all enabled wagons, click on `off` in the _Enabled_ column. * In order to compose a train, select wagons by checking `β˜‘οΈ` in the _Compose_ column. The Package `Tag` will be automatically chosen, and other wagons that can be included in the train run are signalized with 🟒, and the ones which are not compatible with πŸ”΄. All wagons that are compatible can be automatically chosen by clicking on `βœ… Select all compatible wagons`, or by selecting them one by one.
- +Screenshot of train composition settings
  There are a number of settings that you can decide on when composing a train: - * `Target`: Sets the facility/cores where the train will be run. - * `Type`: This setting defines the type of train to be composed, and decides if derived data will be stored. The dropdown offers 4 possible options: - * **Analysis train** - this will be a standard analysis train and no derived data will be produced. - * **Standard derived data** - this train will produce derived data to be used for further analysis. The results will not be merged across runs and can be used as input for future train runs. - * **Linked derived data** - this option is for derived data which needs to access its parent file when it is processed. The derived data file produced will remember its parent files, inheriting also their storage location. The results will not be merged across runs and can be used as input for future train runs. Datasets composed from this train need to have parent access level activated. - * **Slim derived data** - similarly to the standard derived data case, this train will produce derived data to be used for further analysis. This is reserved for derived data of small output size. The results will be merged across runs and are not available to use in future train runs. The data will be automatically deleted after a preset period of time. - * `β˜‘οΈ slow train`: If enabled, the express train features are disabled. This means that you may have up to 2% more jobs which finish but the train run may take several days more. - * `β˜‘οΈ automatic submission`: If enabled, the train will be automatically submitted after the test is done and succeeds `🌟`. - * Finally, after defining the configuration, click `Compose πŸš‚`. After composing a train run, the wagons that are part of it cannot be selected for a different train run unless the current one is [decomposed](#decompose). After the train run is [submitted](#submit), the wagons will be disabled. +* `Target`: Sets the facility/cores where the train will be run. +* `Type`: This setting defines the type of train to be composed, and decides if derived data will be stored. The dropdown offers 4 possible options: + * **Analysis train** - this will be a standard analysis train and no derived data will be produced. + * **Standard derived data** - this train will produce derived data to be used for further analysis. The results will not be merged across runs and can be used as input for future train runs. + * **Linked derived data** - this option is for derived data which needs to access its parent file when it is processed. The derived data file produced will remember its parent files, inheriting also their storage location. The results will not be merged across runs and can be used as input for future train runs. Datasets composed from this train need to have parent access level activated. + * **Slim derived data** - similarly to the standard derived data case, this train will produce derived data to be used for further analysis. This is reserved for derived data of small output size. The results will be merged across runs and are not available to use in future train runs. The data will be automatically deleted after a preset period of time. +* `β˜‘οΈ slow train`: If enabled, the express train features are disabled. This means that you may have up to 2% more jobs which finish but the train run may take several days more. - * `β˜‘οΈ automatic composition`: The train composition schedule is defined in the dataset settings. If the dataset has a defined schedule, the trains will be automatically composed at the specified times if the tests have finished without a warning and there is no derived data activated. +* `β˜‘οΈ automatic submission`: If enabled, the train will be automatically submitted after the test is done and succeeds `🌟`. +* Finally, after defining the configuration, click `Compose πŸš‚`. After composing a train run, the wagons that are part of it cannot be selected for a different train run unless the current one is [decomposed](#decompose). After the train run is [submitted](#submit), the wagons will be disabled. + +* `β˜‘οΈ automatic composition`: The train composition schedule is defined in the dataset settings. If the dataset has a defined schedule, the trains will be automatically composed at the specified times if the tests have finished without a warning and there is no derived data activated.
- +Screenshot of automatic composition
  @@ -57,16 +60,15 @@ There are a number of settings that you can decide on when composing a train: * The train will be automatically tested, and its progress can be followed in the _Train Runs_ table, or in the [**Train Runs**](#train-runs) page by clicking on the TRAIN_ID link. - ### Scheduling of derived data wagons * Wagons with derived data can be scheduled by operators to be automatically composed at the next composition schedule. * This is supported for standard and linked derived data wagons on any dataset with a composition schedule. * Multiple standard derived data wagons can be combined into one train automatically by Hyperloop, but linked derived data wagons are run separately. -* Operators can simply choose to enable or disable the automatic *submission* and *slow train* options. The schedule is automatically determined by Hyperloop (the next scheduled slot in the dataset is used). +* Operators can simply choose to enable or disable the _automatic submission_ and _slow train_ options. The schedule is automatically determined by Hyperloop (the next scheduled slot in the dataset is used).
- +Screenshot of scheduled wagon
### Staged Submission @@ -78,7 +80,7 @@ There are a number of settings that you can decide on when composing a train: * Approval from the participating analyses PWGs conveners is required in order to submit a long train
- +Screenshot of request long train
## Train Runs @@ -88,13 +90,13 @@ There are a number of settings that you can decide on when composing a train: * To compare two trains, select them in the Compare column and click Compare. This will open a new tab displaying the differences between the two trains.
- + Screenshot of compare trains
* The train run detail can be accessed by clicking on the TRAIN_ID, or with the url .
- + Screenshot of train runs page
* The actions allowed in a train run: @@ -111,60 +113,60 @@ There are a number of settings that you can decide on when composing a train: * The _General_ tab displays the summary of the train's progress, direct links to dataset and participating wagon configuration, as well as direct links to the test output and the speedscope profiling of the task.
- + Screenshot of train result
* The _Test results_ tab shows the performance metrics per device (reader, workflows, writer), along with the expected resources. You can use the interactive graphs (per device) to zoom into the area of interest (click and drag) or zoom out (double-click).
- + Screenshot of test results
* In the _Test Graphs_ tab, you can plot the available metrics for the specific _Train run_. By hovering over the graph, the corresponding values are displayed in a dynamic window, stating the value for each participating wagon.
- + Screenshot of test graphs
* The metric can be selected from the upper-left dropdown, and the graph will change accordingly. * To plot the metric data per device, select the _Per Device_ checkbox near the dropdown.
- + Screenshot of test graphs per device
* In order to plot the highest ten graphs, that means the graphs with the highest average, click the **Show top 10 largest** checkbox.
- + Screenshot of graph largest
* You can zoom into the graph by clicking and dragging the mouse along the area of interest. For zooming out, double-click on the graph.
- + Screenshot of graph zoom
- + Screenshot of graph zoom 2
* In _Submitted jobs_, you can see the summary of the master jobs, along with links to the **IO Statistics** and **Stack trace**.
- + Screenshot of submitted jobs 1
* Click the **IO Statistics** button to be redirected to the site activity information.
- + Screenshot of submitted jobs 2
* Click the **Stack trace** button to be redirected to the stack trace information in MonALISA. Here you can see a summary of failures of your jobs.
- + Screenshot of submitted jobs 3
* This information is collected when the masterjobs have finished from all ERROR_V jobs. Some information is already available while the train is running but make sure to check again when the train is in a final state. Common errors are grouped and counted. This allows you to investigate failures and debug them using the provided stack trace. @@ -172,33 +174,33 @@ There are a number of settings that you can decide on when composing a train: * The _Grid statistics_ tab presents a summary of the jobs performance and plots the Files/Job, CPU time/Job and Wall time/Job statitics.
- + Screenshot of grid stats
* If the train is run as a derived data production and there are activated tables, the Derived data tab will be showed. This displays the tables which are produced by the task and saved to the output.
- + Screenshot of train modal derived
* _Merged output_ displays the jobs status after submitting the train. The mergelists are defined in the dataset settings.
- + Screenshot of merged output
* When the final merge is started manually by the operator, some of the runs may not be merged. You can copy the list of merged runs or the total list of runs by clicking on the (red) number. * Here you can also track the submission process, and debug issues that may have taken place.
- + Screenshot of merged output 1
* You can use the _Clone train_ tab to clone the train. The cloned train will have **the same wagon timestamp** of the original train, with the **current dataset configuration**. This means that if the users have changed the wagon configuration in the meanwhile, this is not taken into account (this is different from the LEGO trains). * Other settings can be modified: package tag, target facility, slow train option, derived data, automatic submission.
- + Screenshot of clone train
### Request Long Train @@ -207,13 +209,13 @@ There are a number of settings that you can decide on when composing a train: * When requesting a long train, it is possible to request standard derived data from a short train with slim derived data.
- + Screenshot of long train derived type
* Any user who is part of the analysis can request a long train. Approval from the participating analyses PWGs conveners is required in order to submit a long train. Train operators or admins can also approve a long train, but it is usually done by the PWG.
- + Screenshot of request long train
* Once the long train is approved: @@ -221,7 +223,7 @@ There are a number of settings that you can decide on when composing a train: * Otherwise the Submit button is enabled and the operator can submit the train
- + Screenshot of long train approved
## Trains with issues @@ -233,7 +235,7 @@ There are a number of settings that you can decide on when composing a train: * There is a final merge job in final state, but the merging is not declared as _done_ in the database
- + Screenshot of trains with issues
* The operator must analyse this cases and decide upon resubmitting some of the jobs, launching the final merging submission where the errors are not significant, or killing the train when there are too many errors. @@ -245,7 +247,7 @@ There are a number of settings that you can decide on when composing a train: * The user can browse and click on the _Dataset_ they want to add to their analysis.
- +Screenshot of enable dataset datasets page
* Inside of the _Dataset_ view page, click on the button `✚ Add dataset to analysis`. It will display a list of all the analyses you belong to. Select the _Analysis_ you want to add the dataset to, and click on `πŸ’Ύ Save`. @@ -253,7 +255,7 @@ There are a number of settings that you can decide on when composing a train: * By clicking the `πŸ“` button, the operator is able to modify the dataset in the [**Edit Dataset**](#edit-dataset) page.
- + Screenshot of datasets page
* The runlists will be received programmatically from the DPG. @@ -263,7 +265,7 @@ There are a number of settings that you can decide on when composing a train: * Allows the operator to update the dataset properties. Firstly, the operator can update the name and description of the dataset, and activate or deactivate it by clicking the `❌` / `βœ…` button. In order to save the changes you made, click the _Save all changes_ button.
- + Screenshot of edit dataset options
* In the **Options** box, you can add short datasets to the current dataset, which will be used for the [**staged submission**](#stagedsubmission). Enabling _Run final merging over all runs in this dataset_ will merge all the runs of all the productions during the final merging. @@ -273,7 +275,7 @@ There are a number of settings that you can decide on when composing a train: * To unstage the data to a specific target, click the _Unstage_ button. The unstaging process will start once clicking _Save all changes_.
- + Screenshot of edit dataset staging
* In the **Automatic Composition** box, the operator is able to enable the automatic train composition. Choose the composition type, the maximum CPU time that can be consumed and the maximum number of trains that can be composed per week for an analysis. @@ -283,13 +285,13 @@ There are a number of settings that you can decide on when composing a train: * For all these cases, the trains will only be composed if the tests finished without a warning and if they do not store derived data.
- + Screenshot of automatic composition 3
* Choose the days and times at which the trains should be composed.
- + Screenshot of automatic composition 2
### Deciding on data to be processed depends on the dataset type @@ -297,25 +299,25 @@ There are a number of settings that you can decide on when composing a train: * For RUN 2 data, the operator can add or remove a [**RUN 2 conversion train run**](https://alimonitor.cern.ch/trains/train.jsp?train_id=132#runs).
- + Screenshot of add train run
* For RUN 3 data and MC, the operator can add or remove a production. In order to create a new production, click on the _+Production_ button. After choosing the collision type, anchor and MC Tag, select the runlist defined by the DPG and click _+Add_. If no runlist is available, contact the DPG specialists for creating one.
- + Screenshot of add dataset production
* For derived data, you can add or remove a production. Create a production by selecting _Data_, choose the desired _Period_ and select the required _Derived train_ from the dropdown list.
- + Screenshot of dataset derived data
* Within the dataset production you can update the list of runs to be excluded.
- + Screenshot of change dataset production
* The mergelist defines which runs are merged into one file at the end of the train running. The operator can add, update, activate or deactivate a mergelist in the dataset. @@ -325,11 +327,11 @@ There are a number of settings that you can decide on when composing a train: * Accessed from the **Datasets** view, this page summarizes the derived data available in Hyperloop. The information displayed can be grouped by Dataset, Analysis or PWG (use the upper buttons to switch between the views).
- + Screenshot of derived data access
- + Screenshot of derived data grouping
* Make use of the available filters of the table to search for the derived data of interest. Expand or collapse groups to focus on a specific derived data or use the Expand all/Collapse all button to expand/collapse all groups. @@ -337,29 +339,29 @@ There are a number of settings that you can decide on when composing a train: * By clicking on the derived data train number, this will open the Train result view (the same one accessed from pages such as Train runs or Trains with issues). You can schedule derived data for deletion by clicking on the **Delete** button in the train view or in the Delete column of the table. The deletion will only be available if the derived data is not used in any datasets or if the datasets using this derived data are not activated. In case these conditions are not met, you can ask the analyzers if the derived data is still needed for the activated datasets or they can be removed.
- + Screenshot of derived data delete
- + Screenshot of derived data no delete
* To see all the datasets in which a derived data is used, click **See dependent datasets** button in the Train result view (right next to the Delete button). This will redirect you to a new tab displaying the **Datasets** page, filtered to show all the datasets (activatd or not) which are using the derived data. To see specifically only the activated or deactivated datasets dependent on this derived data, use the activated / deactivated buttons inside the In datasets column of the table. This will open the same Datasets page, but filtered depending on the datasets' activated state.
- + Screenshot of derive data dependent
* Click on the name within the Analysis column to be redirected to a new tab showing a read-only view of the analysis within which the derived data was created.
- + Screenshot of derived data analysis
* The total size of the derived data in Hyperloop is displayed below the table, on the right side. Keep in mind that this is the total size of all derived data and it is not affected by the filtering of the table.
- + Screenshot of derived data size
## Staging status @@ -367,13 +369,13 @@ There are a number of settings that you can decide on when composing a train: * Accessed from the **Datasets** view, this page displays the staging status of all the datasets in Hyperloop for which a staging process was initiated. Use it to follow up the progress and check if the staging is completed, ongoing, or if there are any issues.
- + Screenshot of staging status page
* Click on the staging percentage in the right-most column to view the detailed staging progress in a new tab: this shows the status of each transfer request.
- + Screenshot of staging process
## DPG Runlists @@ -381,17 +383,17 @@ There are a number of settings that you can decide on when composing a train: * The [**DPG Runlists**](https://alimonitor.cern.ch/hyperloop/runlists) page is dedicated to the DPG experts and displays all the DPG runlists created for the datasets. The DPG expert can add, edit or remove a runlist.
- + Screenshot of dpgrunlists
* Clicking on the the `πŸ“` button will lead to the edit view, where the DPG expert can change the list of runs.
- + Screenshot of edit runlist
* DPG experts can create a new runlist by clicking the **+Add runlist** button. In order to create the list of runs, the correct data type, anchor, tag and production must be selected.
- + Screenshot of add runlist
diff --git a/docs/hyperloop/userdocumentation.md b/docs/hyperloop/userdocumentation.md index be2d15ab..6a9f3409 100644 --- a/docs/hyperloop/userdocumentation.md +++ b/docs/hyperloop/userdocumentation.md @@ -7,10 +7,10 @@ title: User Documentation When opening a page in Hyperloop which has not been visited before, a guided tour will explain key concepts. These tours provide an interactive learning experience for Hyperloop, easily activated with a single click. They are ideal for beginners and for refreshing knowledge. -Where appropriate, when one tour ends, the next will begin to explain the next section of Hyperloop. Tours can be exited at any time. Once closed, they will not automatically begin on future page visits. +Where appropriate, when one tour ends, the next will begin to explain the next section of Hyperloop. Tours can be exited at any time. Once closed, they will not automatically begin on future page visits.
- +Screenshot of joyride welcome
### Tour Elements @@ -18,13 +18,13 @@ Where appropriate, when one tour ends, the next will begin to explain the next s * Each element of Hyperloop with a tour includes a tour 🚌 button. Clicking this button initiates the tour.
- +Screenshot of joyride tour icon
* Each tour step includes a _Next_ button to access the next step of the tour. The page will automatically scroll to and highlight the next element to be explained. Also displayed is the current step number and total number of steps in the tour.
- +Screenshot of joyride next button
* Each tour step additionally includes an exit button. Clicking this closes the tour. After clicking this, the tour of the given section will not automatically open on future visits to the section of Hyperloop. To access the tour of the section again, the relevant tour 🚌 button must be clicked. @@ -53,7 +53,7 @@ The _Service wagons_ are wagons which are dependencies to other wagons. They are Using the _My Analyses_ page, inside of the _Analysis_ you want to add the wagon to, click on `✚ Add new wagon`.
- +Screenshot of new wagon
There are 2 parameters required to create a new wagon: @@ -68,7 +68,7 @@ By clicking on `πŸ’Ύ Save` the wagon will be added, and you will be redirected t * Using the _My Analyses_ page, click on the button `🧬` to clone a wagon.
- +Screenshot of clone wagon
A list of _Analyses_ you belong to will be displayed. You have to select the _Analysis_ where you want to clone the _Wagon_ to, and name the _NewWagon_ (the wagon name has to be unique within _Analysis_). By clicking on `🧬 Clone`, a new wagon will be added with the same configuration as the _Wagon_ including subwagons and derived data. @@ -83,24 +83,24 @@ You can get to the _All Analyses_ page by using the main menu, or by the link in * By clicking on the top-left corner, you will be redirected to a read-only view of the wagon, that can be shared with colleagues and support. The top right corner history symbol leads to the [_Wagon **History**_](#wagonhistory) page, which will display the state evolution of the wagon.
- +Screenshot of wagon shortcuts
- + ##
Wagon Settings * In _Wagon settings_ you can modify the wagon name, work flow name, and select wagon's dependencies. The dependencies offered are wagons from the same _Analysis_ or from [_Service wagons_](#servicewagons).
- +Screenshot of wagon settings
- + ## Wagon Configuration * In _Configuration_ the wagon configuration corresponding to the workflow will be available in the _Base_. The configuration is divided per _Task_, hence if you need to add a new parameter, you will need add it in the following order: task, parameter and value. * The wagon configuration supports a variety of parameter types defined in task as _Configurable_ including: primitive type parameters, fixed-length arrays, variable-length arrays, matrices, labelled matrices and histogram binning.
- +Screenshot of detailed configuration
* The _Variable-length arrays_ allow the user to add/remove elements in the _Base_ wagon, and the change will be propagated in all the subwagons. @@ -115,62 +115,60 @@ You can get to the _All Analyses_ page by using the main menu, or by the link in * The subwagons added to the wagon will be represented with an automatically assigned suffix in _AnalysisResults.root_. Here you can see an example where we have added two subwagons called _smalleta_ and _verysmalleta_.
- +Screenshot of subwagon suffix
* In order to update the base and subwagon configuration with the latest version of the workflow, click on the button `↻ sync` in _Configuration_. By synchronizing the configuration, the parameters which no longer belong to the workflow will be removed, and the values of the wagon's _Base_ will be updated as well if they have not been modified by the user. ### Upload Wagon Configuration via JSON -* The wagon configuration may be adjusted via JSON file. Any values in the wagon will be adjusted to the values in the JSON file. +* The wagon configuration may be adjusted via JSON file. Any values in the wagon will be adjusted to the values in the JSON file.
- +Screenshot of update via json button
* The required format for Hyperloop to ingest the JSON is the exact format given when downloading the configuration JSON file from Hyperloop (from the download button above). This download includes workflows from dependencies. When uploading, it is not necessary to remove dependencies - any workflows not directly from the wagon will be ignored during the upload.
- +Screenshot of update via json page one
* Only configurables and subwagons which already exist in the wagon may be edited - any new subwagons or configurables in the JSON will be ignored. To add new subwagons, first add them to the existing wagon. When a file is chosen, each changed value is shown in the 'overview'. All values in 'base' are listed first, with subwagons listed below. In the example below, there are two subwagons, 'Pos' and 'Neg'. Hovering over any value will display the change in a tooltip.
- +Screenshot of update via json overview
-* There is validation to check for invalid values. It also ensures that there are no identical subwagons. +* There is validation to check for invalid values. It also ensures that there are no identical subwagons.
- +Screenshot of update via json validation
* A full diff between the current and uploaded configuration is also available. Every difference between the current and uploaded configuration is shown.
- +Screenshot of update via json diff
* Once 'Apply Changes' is pressed, any altered values will be highlighted in yellow. No changes are saved until the 'save' button is pressed, so it is possible to apply the changes to view them without losing the current configuration.
- +Screenshot of update via json highlight
+## Derived data +* In _Derived Data_ the tables which are produced by the task are displayed. If activated, these are saved to the output if the train is run as a derived data production. The produced derived data can be made available by the operators and serve as input for subsequent trains. - -## Derived data - -* In _Derived Data_ the tables which are produced by the task are displayed. If activated, these are saved to the output if the train is run as a derived data production. The produced derived data can be made available by the operators and serve as input for subsequent trains. - ### Derived data types + * There are three types of derived data specifications: * Standard derived data (marked with πŸ—‚οΈ)- if the wagon is used in a train, this will produce derived data to be used for further analysis. The results will not be merged across runs and can be used as input for future train runs. Note that standard derived data trains do not submit automatically and may need additional approval. If in doubt, please seek advise before enabling derived data tables in your wagon configuration. * Slim derived data (marked with green bordered πŸ—‚οΈ) - similarly to the standard derived data case, if used in a train, this will produce derived data to be used for further analysis. This is reserved for derived data of small output size. The results will be merged across runs and are not available to use in future train runs. The data will be automatically deleted after a preset period of time. You can mark a wagon for running as slim derived data by checking `Ready for slim derived data`. * Linked derived data (marked with red bordered πŸ—‚οΈ) - linked derived data trains will also produce derived data to be used for further analysis. Linked derived data has access to the parent AO2D - this is not the case for other derived data types. Like standard derived data, results are not merged across runs. - + * For wagons set as ready for slim derived data, two more fields need to be correctly set: * Max DF size - This sets the maximal dataframe size in the merging step. Has to be 0 for not-self contained derived data (which need parent file access). * Max derived file size - Sets the size limit for the output file size of the derived data file. This is an expert parameter which usually does not have to be changed. Only change this value if the processing in subsequent trains takes so long that the jobs fail. If set to 0 a good value will be automatically determined. @@ -182,33 +180,33 @@ When enabling `Ready for slim derived data` the option has to be selected for th * In order to update the derived data configuration with the latest version of the workflow, click on the button `↻ sync` in _Derived data_. By synchronizing the derived data, the tables which no longer belong to the workflow will be removed, and the values of the tables will be updated.
- +Screenshot of derived data example
-## Test Statistics +## Test Statistics * _Test Statistics_ contains three graphs that display different metrics following the tests this wagon was part of. The first graph plots the _PSS Memory_ corresponding to each test run. The second one diplays the _CPU Time_, _Wall time_ and _Throughput_ along the test runs for this wagon. Finally, the third graph shows the _Output size_ at each test run.
- +Screenshot of test statistics
* Depending on the datasets this wagon is using, the user is able to choose the _Dataset_ on which the metrics mentioned above are plotted, from the upper-left dropdown. In case no tests were run on the chosen dataset, a message will appear stating this.
- +Screenshot of dataset dropdown
* By clicking on the bullets representing the metric value at any of the test runs plotted, the user will open a new tab displaying a read-only view of the wagon test output.
- +Screenshot of test stats graphs
* In order to zoom into the graph, the user needs to click and drag over the are of interest, which will automatically show the zoomed-in graph portion. By double-clicking, it will zoom out and show the entire graph.
- +Screenshot of zooming
### 4. Wagon History @@ -216,7 +214,7 @@ When enabling `Ready for slim derived data` the option has to be selected for th * In the _Wagon History_ page, there is a summary of the wagon's state from the creation until the last update. By clicking on the _+_ symbol, one can expand the information, showing the subwagons' details and the derived data at each timestamp.
- +Screenshot of wagon history
* On the right side of the page, the user can select two timestamps in order to compare the state of the wagon between the two by clicking _Compare_. This will lead to [_Compare Wagons_](#compare-wagons) page. You can cancel your current selection by clicking _Unselect all_. @@ -228,7 +226,7 @@ When enabling `Ready for slim derived data` the option has to be selected for th * The Derived data tab reflects the differences concerning the derived data at the two timestamps.
- +Screenshot of compare wagons
### 6. Compare individual wagons @@ -237,7 +235,7 @@ When enabling `Ready for slim derived data` the option has to be selected for th * Using the _My Analyses_ page, click on the button `πŸ†š` to compare the wagon. A list of all your wagons will be displayed. Once you select the desired wagon, this will open the comparison view in a new tab. This has a similar structure to the different timstamps comparison.
- +Screenshot of compare user wagons
## Creating or joining an analysis @@ -256,7 +254,7 @@ The rest of the parameters are not relevant for the Hyperloop train system. After all the parameters have been set, click on `Create` and your _Analysis_ will be available in the _My Analyses_ and _All Analyses_ page. - +Screenshot of comparison with jira The synchronization from JIRA to the Hyperloop train system can take up to 30 minutes. @@ -267,7 +265,7 @@ The synchronization from JIRA to the Hyperloop train system can take up to 30 mi * Inside of an analysis, click on the button `Datasets and Settings πŸ“`.
- +Screenshot of dataset and settings
* There is a list of _Enabled datasets in **Analysis**_. You can disable a dataset for that analysis by clicking on the button `❌`. @@ -279,7 +277,7 @@ The synchronization from JIRA to the Hyperloop train system can take up to 30 mi * You can browse and click on the _Dataset_ you want to add to your analysis.
- +Screenshot of enable dataset datasets page
* Inside of the _Dataset_ view page, click on the button `✚ Add dataset to analysis`. It will display a list of all the analyses you belong to. Select the _Analysis_ you want to add the dataset to, and click on `πŸ’Ύ Save`. @@ -293,7 +291,7 @@ You can enable a wagon in the _My Analyses_ page. Inside of the _Analysis_ there 3. Pull request: Select the option `β˜‘οΈ Future tag based on pull request`. There will be a list of the latest merged/unmerger pull requests available with their corresponding description. By choosing a pull request, your wagon will be tested as soon as the pull request is merged in a package tag. Then your wagon will be composed in a train with the latest package tag available.
- +Screenshot of enable wagon
After choosing the package tag to be used, click on the button `❌` to enable your wagon in a dataset, the icon will change from `❌` to `βœ…`. If you hover over `βœ…` you can see the information about the enabled wagon: package tag, time and username. If you need to disable a wagon in a dataset, click on the button `βœ…`. After enabled, the wagon will be automatically tested and you can follow the progress of the [test](#wagon-test) on the button next to `βœ…`: `βŒ›οΈ` queued,`⏳` ongoing,`🌟` done, `❗️` warning and `πŸ’£` failed. @@ -310,7 +308,7 @@ You can enable a wagon in the _My Analyses_ page. Inside of the _Analysis_ there * If a wagon test has failed, one can study the failure source by clicking the test output button. This will open in a new tab the list of files that can be used to track the possible issues that led to the failure.
- + Screenshot of debug failed test
* You can analyse: @@ -324,17 +322,17 @@ You can enable a wagon in the _My Analyses_ page. Inside of the _Analysis_ there * The Test results tab shows the performance metrics per device (reader, workflows, writer), along with the expected resources. You can use the interactive graphs (per device) to zoom into the area of interest (click and drag) or zoom out (double-click).
- + Screenshot of wagon test results
* The Test Graphs tab, plots the available metrics for the specific wagon test. You can choose the metric of interest from the dropdown, zoom into the graph (click and drag) and zoom out (double-click).
- //already there + Screenshot of test graphs //already there
* If you only want to see the top 10 graph with the highest average, check the Show top 10 largest box. - + * To produce this type of performance graphs for a local O2 execution, follow the instructions [here](#producing-performance-graphs-for-a-local-o2-execution). * Whenever a wagon configuration is changed, if there are enabled wagons (including wagons that depend on it), then the test is automatically reset and a new test is launched. However, if the enabled wagon was already composed in a train, the train will run with the wagons and dataset configuration of the time at which the train was created. @@ -345,101 +343,98 @@ When creating or enabling wagons, you can use a pull request instead of a packag 1. [Adding a new wagon](#addwagon): You can create a wagon with your unmerged or unreleased workflow. If the workflow is not available, add manually the configuration of the wagon, and subwagons if needed. You can synchronize the wagon's configuration once the package tag that includes your pull request has been released. 2. [Enabling a wagon in a dataset](#enabling-a-wagon): If you need to enable your wagon with workflow that is unmerged or unreleased, use a `Future tag based on pull request`. There is a list of the latest merged and unmerged pull requests available in the system, you can see the pull request number and description. Select the _pull request tag_ and enable the wagon in a dataset. By doing this, the wagon will be queued to test, and the test will begin once the _pull request_ has been merged to a package tag, and the package tag is released. And then, if the test is successful, it'll be composed in a train with the latest package tag available. - + ## Warnings - + When a wagon test finishes in warning, this means that the wagon will not be included in the automatic composition schedule. Therefore, train composition can be requested in the Operation channel, where an operator will take care of the request. Before doing so, please review if you cannot fix the cause of the warning yourself. Depending on the nature of the warning and the degree of exceeding specific constraints, the operator will either compose your train or advise you to review and improve certain parts before requesting a train again. In the latter case, the user can analyze the test and review the logs, searching for ways of improving resource usage or other elements that caused the exceptions. - + There are a number of warnings, which will require different courses of action: - + ### 1. Memory consumption too large for automatic train submission - +
- + Screenshot of warning for memory
- - * The memory consumption is larger than the limit. In wagon tests, the limit is the memory allowance of a two core target minus a small buffer, which is ~ 3.6GB. - * In the train test, the limit is the memory allowance of the train target. For Grid - Single core and 2 core, trains may be submitted even with the warning: If the average PSS memory is <= 3.2 GB, then operators will compose your train on Grid - Single core. Otherwise, if it is > 3.2 GB and <= 4 GB, the operators will compose the train on request on Grid - 2 core. If larger than 4 GB, then the train cannot be composed. The user should check for ways of improving memory consumption. - * For the other target queues, trains can only be composed if the memory consumption is within the target limits. - * For the cases when the train cannot be composed due to high memory consumption, the user can review the test. One can check the logs and look for any possible improvements that can be done for a lower memory consumption. - +* The memory consumption is larger than the limit. In wagon tests, the limit is the memory allowance of a two core target minus a small buffer, which is ~ 3.6GB. +* In the train test, the limit is the memory allowance of the train target. For Grid - Single core and 2 core, trains may be submitted even with the warning: If the average PSS memory is <= 3.2 GB, then operators will compose your train on Grid - Single core. Otherwise, if it is > 3.2 GB and <= 4 GB, the operators will compose the train on request on Grid - 2 core. If larger than 4 GB, then the train cannot be composed. The user should check for ways of improving memory consumption. + +* For the other target queues, trains can only be composed if the memory consumption is within the target limits. +* For the cases when the train cannot be composed due to high memory consumption, the user can review the test. One can check the logs and look for any possible improvements that can be done for a lower memory consumption. + ### 2. Maximal PSS more than 30% larger than average PSS - +
- + Screenshot of warning pss
- - * The maximum PSS memory consumption is more than 30% larger than the average PSS, therefore the train cannot be automatically composed. This warning means that a memory leak is possible, so it must be checked by an operator. If there is no memory leak, the train can be composed. Otherwise, the operator will advise the user to check for possible causes and improvements before requesting again. + +* The maximum PSS memory consumption is more than 30% larger than the average PSS, therefore the train cannot be automatically composed. This warning means that a memory leak is possible, so it must be checked by an operator. If there is no memory leak, the train can be composed. Otherwise, the operator will advise the user to check for possible causes and improvements before requesting again. ### 3. CPU usage too large - +
- + Screenshot of warning cpu
- - * The CPU usage limit is set per dataset and all trains running on a specific dataset must respect this constraint. If the limit is not respected, the train cannot be composed without PWG approval. Therefore, the user should discuss the details and requirements for this train with the PWG before requesting again. Depending on the amount of total resources, an approval in the Physics Board (PB) may also be needed. The CPU limit of a dataset may be viewed on the dataset page. - * It is possible for a train to have a CPU warning when composed despite the wagon test not having a CPU warning. This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset. - + +* The CPU usage limit is set per dataset and all trains running on a specific dataset must respect this constraint. If the limit is not respected, the train cannot be composed without PWG approval. Therefore, the user should discuss the details and requirements for this train with the PWG before requesting again. Depending on the amount of total resources, an approval in the Physics Board (PB) may also be needed. The CPU limit of a dataset may be viewed on the dataset page. +* It is possible for a train to have a CPU warning when composed despite the wagon test not having a CPU warning. This usually happens in a situation where the wagon test (which runs on a single core) uses so much memory that it doesn't fit a single core job on the grid and therefore needs two cores for the train (more cores means a higher memory allowance). But if the devices in the wagon cannot be parallelised well over multiple cores, this leads to more wall time and a higher CPU usage as the cores will be underutilised. In this situation, one can either reduce the wagon memory consumption to fit into a single core or reduce the CPU consumption to fit the dataset. + ### 4. Too many CCDB calls - +
- + Screenshot of warning ccdb
- - * Too many calls to the CCDB, therefore the train cannot be composed, and the cause of a high number of calls should be checked. + +* Too many calls to the CCDB, therefore the train cannot be composed, and the cause of a high number of calls should be checked. ### 5. Reduction factor too small - +
- + Screenshot of warning reduction factor
- - * This occurs when the reduction factor is lower than 50. If the expected output size is below 50 GB, the operator can compose the train on request. If larger, the train cannot be composed. - + +* This occurs when the reduction factor is lower than 50. If the expected output size is below 50 GB, the operator can compose the train on request. If larger, the train cannot be composed. + ### 6. Log output too large - +
- + Screenshot of warning log output
- - * The log file is too large, therefore the train cannot be composed, and the user should check for factors leading to this. - + +* The log file is too large, therefore the train cannot be composed, and the user should check for factors leading to this. + ### 7. Derived output too large for slim train - +
- + Screenshot of warning derived output
- - * This is specific to tests with wagons set as ready for slim derived data. As the entire output is merged into one single file, there is a limit of 4000 MB for this. If exceeded, the user is advised to switch to standard derived data by unchecking the option β€œReady for slim derived data” in the wagon edit view. Then a request for standard derived data train can be made. + +* This is specific to tests with wagons set as ready for slim derived data. As the entire output is merged into one single file, there is a limit of 4000 MB for this. If exceeded, the user is advised to switch to standard derived data by unchecking the option β€œReady for slim derived data” in the wagon edit view. Then a request for standard derived data train can be made. ### 8. Unbound indices detected in AO2D merging - +
- + Screenshot of warning unbound indices
- - * For derived data trains, it notifies the detection of unbound columns during AO2D merging. This means that one of the output tables which has been asked to be stored has index columns to tables which are not within the output. This usually points to a bad or broken data model definition and should be fixed. The only case where this is expected and not worrisome is linked derived data. For both slim derived data and standard derived data, the data model should be fixed. +* For derived data trains, it notifies the detection of unbound columns during AO2D merging. This means that one of the output tables which has been asked to be stored has index columns to tables which are not within the output. This usually points to a bad or broken data model definition and should be fixed. The only case where this is expected and not worrisome is linked derived data. For both slim derived data and standard derived data, the data model should be fixed. ### 9. Too many input files expected to go to derived output - +
- + Screenshot of linked files derived output
- - * This warning only appears for linked derived data. The maximum number of input files which can go to derived output is 25. The warning will display how many are expected. If this warning appears, the train cannot be submitted. + +* This warning only appears for linked derived data. The maximum number of input files which can go to derived output is 25. The warning will display how many are expected. If this warning appears, the train cannot be submitted. ### Multiple warnings - + It is possible that a wagon test or train test will produce multiple warnings. In that case, the checks above will be done for each warning present, and the decision making regarding train submission will be done considering all the exceptions. - - +
- + Screenshot of multiple warnings
- - + ## All Analyses * [**All Analyses**](https://alimonitor.cern.ch/hyperloop/all-analyses) is a read only view of all analyses available in the system. Click on the analysis name to be redirected to a read-only view of the analysis. @@ -452,13 +447,13 @@ It is possible that a wagon test or train test will produce multiple warnings. I * To compare two trains, select them in the Compare column and click Compare. This will open a new tab displaying the differences between the two trains.
- + Screenshot of compare trains
* The train run result can be accessed by clicking on the TRAIN_ID, or with the url .
- + Screenshot of train runs page
## Train Run Result @@ -466,62 +461,62 @@ It is possible that a wagon test or train test will produce multiple warnings. I * The _General_ tab displays the summary of the train's progress, direct links to dataset and participating wagon configuration, as well as direct links to the test output and the [speedscope](https://johnysswlab.com/speedscope-visualize-what-your-program-is-doing-and-where-it-is-spending-time/) profiling of the task.
- + Screenshot of train result
* The _Test results_ tab shows the performance metrics per device (reader, workflows, writer), along with the expected resources. You can use the interactive graphs (per device) to zoom into the area of interest (click and drag) or zoom out (double-click).
- + Screenshot of test results
* In the _Test Graphs_ tab, the user can plot the available metrics for the specific _Train run_. By hovering over the graph, the corresponding values are displayed in a dynamic window, stating the value for each participating wagon.
- + Screenshot of test graphs
* The metric can be selected from the upper-left dropdown, and the graph will change accordingly. * The user can choose to plot the metric data per device, by checking the _Per Device_ near the dropdown.
- + Screenshot of test graphs per device
* In order to plot the highest ten graphs, that means the graphs with the highest average, the user can click the **Show top 10 largest** checkbox.
- + Screenshot of graph largest
* The user can zoom into the graph by clicking and dragging the mouse along the area of interest. For zooming out, the user must double-click on the graph.
- + Screenshot of graph zoom
- + Screenshot of graph zoom 2
- + * To produce this type of performance graphs for a local O2 execution, follow the instructions [here](#producing-performance-graphs-for-a-local-o2-execution). * In _Submitted jobs_, you can see the summary of the master jobs, along with links to the **IO Statistics** and **Stack trace**.
- + Screenshot of submitted jobs 1
* Click the **IO Statistics** button to be redirected to the site activity information.
- + Screenshot of submitted jobs 2
* Click the **Stack trace** button to be redirected to the stack trace information in MonALISA. Here you can see a summary of failures of your jobs.
- + Screenshot of submitted jobs 3
* This information is collected when the masterjobs have finished from all ERROR_V jobs. Some information is already available while the train is running but make sure to check again when the train is in a final state. Common errors are grouped and counted. This allows you to investigate failures and debug them using the provided stack trace. @@ -529,41 +524,41 @@ It is possible that a wagon test or train test will produce multiple warnings. I * The _Grid statistics_ tab presents a summary of the jobs performance and plots the Files/Job, CPU time/Job and Wall time/Job statitics.
- + Screenshot of grid stats
* If the train is run as a derived data production and there are activated tables, the Derived data tab will be showed. This displays the tables which are produced by the task and saved to the output.
- + Screenshot of train modal derived
-* _Merged output_ displays the merging jobs and the output directories. A merged output is created for every mergelist and final mergelist in the dataset, along with the full train merge. The mergelists and final mergelists are defined in the dataset settings. Mergelists contain lists of runs from a single runlist, while final mergelists are used to combine mergelists across productions. +* _Merged output_ displays the merging jobs and the output directories. A merged output is created for every mergelist and final mergelist in the dataset, along with the full train merge. The mergelists and final mergelists are defined in the dataset settings. Mergelists contain lists of runs from a single runlist, while final mergelists are used to combine mergelists across productions.
- + Screenshot of merged output
* When the final merge is started manually by the operator, some of the runs may not be merged. You can copy the list of merged runs or the total list of runs by clicking on the (red) number. * Here you can also track the submission process, and debug issues that may have taken place.
- + Screenshot of merged output 1
- ### Request Long Train * The _Request long train_ tab allows users to request a long train after the train ran on a short dataset. Short datasets are subsets of a big dataset (_set up in the Dataset settings_). First, a train run needs to be **Done on a smaller short dataset** before being run on a bigger dataset. * Any user who is part of the analysis can request a long train. Approval from the participating analyses PWGs conveners is required in order to submit a long train. +
- + Screenshot of long train derived type
* When requesting a long train, it is possible to request standard derived data from a short train with slim derived data by changing the derived data setting as shown above.
- + Screenshot of request long train
* Once the long train is approved: @@ -571,55 +566,61 @@ It is possible that a wagon test or train test will produce multiple warnings. I * Otherwise the Submit button is enabled and the operator can submit the train
- + Screenshot of long train approved
- + ## Producing performance graphs for a local O2 execution - + **Performance Graphs** page allows the user to upload his own local metrics file, and then generate the test graphs specific to that file. You produce a local _performanceMetrics.json_ by running the o2 workflow with the argument _--resources-monitoring 2_ which, in this example, produces monitoring information every 2 seconds. These are the same type of graphs produced in the _Test Graphs_ tab of the train run. This page can be accessed at: .
- +Screenshot of performance graphs
## Reproducing a train run on a local machine + A train test or a Grid train run can be redone on a local machine. This is useful to understand a problem better or to just reproduce some settings of a previous train. In order to do so, you need two general prerequisites: - * Download the `run_train.sh` script from [here](https://alimonitor.cern.ch/train-workdir/run_train.sh). Say this is placed in a folder `/my/path/`. - * Make sure `jq` is installed on your system. Type `jq` on the command prompt. If you get an error that the command was not found, you have to install it. This package is a system package on most systems (it has nothing to do with ALICE). Use Google if you need instructions for your specific operation system. - * Now the following command should work and give reasonable output: + +* Download the `run_train.sh` script from [here](https://alimonitor.cern.ch/train-workdir/run_train.sh). Say this is placed in a folder `/my/path/`. +* Make sure `jq` is installed on your system. Type `jq` on the command prompt. If you get an error that the command was not found, you have to install it. This package is a system package on most systems (it has nothing to do with ALICE). Use Google if you need instructions for your specific operation system. +* Now the following command should work and give reasonable output: + ```bash /my/path/run_train.sh --help ``` + To now run a specific train test or Grid run, you need to create a folder and put there two files: - * Download the `full_config.json` from the train test or Grid run - * Create a file `input_data.txt` in which you put the file paths of the data you want to process. You can either put the paths to files on AliEn or download the data locally and point to the local paths. Each line should contain one file. In order to take the same data as from a train test, you can check at the top of the `stdout.log` of a train test where you have the AliEn paths and also paths to download the files to your local machine. You then run: + +* Download the `full_config.json` from the train test or Grid run +* Create a file `input_data.txt` in which you put the file paths of the data you want to process. You can either put the paths to files on AliEn or download the data locally and point to the local paths. Each line should contain one file. In order to take the same data as from a train test, you can check at the top of the `stdout.log` of a train test where you have the AliEn paths and also paths to download the files to your local machine. You then run: + ```bash /my/path/run_train.sh --skip-perf ``` - ## Train slots per week -For a given analysis, every dataset has a train slots per week limit. This limit is shown in the dataset under 'Maximal train slots per analysis per week'. This limit is to ensure fair usage of resources, and is calculated on a rolling basis. You may view how many slots have been used here: +For a given analysis, every dataset has a train slots per week limit. This limit is shown in the dataset under 'Maximal train slots per analysis per week'. If an analyzer uses the same dataset across multiple analyses, the calculation (for the analyzer only) is the summed slots across analyses. + +These limits ensure fair usage of resources, and are calculated on a rolling basis. You may view how many slots have been used for a dataset from the wagon table in My Analyses:
- +Screenshot of train slots per user per analysis
- -Trains may use more than one slot. The number of slots is calculated as the number of wagons from the analysis in the train, capped by the number of cores that the train runs with. The slots used per analysis may be viewed in the train 'Test - Full Test' tab: +Trains may use more than one slot. The number of slots is calculated as the number of wagons from the analysis in the train, capped by the number of cores that the train runs with. The slots used per analysis may be viewed in the train 'Test - Full Test' tab, where also the time when the train will stop counting towards the quota is shown:
- +Screenshot of weekly slots
If a single user wagon needs more memory than available in a single core queue, it can still be composed by hyperloop to the two core queue but it will count as a **heavy wagon**. Heavy wagons count as two slots. These wagons are listed in red in the train 'Test - Per Wagon' tab:
- +Screenshot of heavy wagon
- ## Local merging scripts + [Here](https://github.com/romainschotter/HYRunByRunMerging/tree/main) is a repository containing scripts to download all output files from a Hyperloop train run by run, and to merge locally only the files associated to a given run list. diff --git a/docs/images/trainSlots.png b/docs/images/trainSlots.png deleted file mode 100644 index 710cfcd7..00000000 Binary files a/docs/images/trainSlots.png and /dev/null differ diff --git a/docs/images/trainSlotsPerUserPerAnalysis.png b/docs/images/trainSlotsPerUserPerAnalysis.png new file mode 100644 index 00000000..69c34a4b Binary files /dev/null and b/docs/images/trainSlotsPerUserPerAnalysis.png differ