Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions docs/hyperloop/co2eestimates.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,30 @@ title: CO2 equivalent estimates
## <a name="estimates"></a>CO2 equivalent estimates

<div align="center">
<img src="../images/co2eqtooltip.png" width="35%">
<img src="../images/co2eqtooltip.png" width="35%" alt="Co2eq tooltip" />
</div>
<br>

* In Hyperloop, an estimate of the CO2eq produced by your trains is shown in order to give an idea of the environmental impact. We need to run analysis to achieve our scientific goals but we can optimize the code and sometimes work efficiently even with less trains. The displayed value should help educate the decision if a train is needed.
The value is shown before trains have been run (an estimate based using the wagon test) and then when a train run has finished.
The value is shown before trains have been run (an estimate based using the wagon test) and then when a train run has finished.
* The estimate is visible in:
* Wagon test view and train submission view (an estimate based on the wagon test)
* Train test view (an estimate based on the train test. So, this estimate accounts for all wagons in the train)
* In the 'General' tab of the train view, when a train is in the state 'Done'. This estimate directly uses the CPU usage of the train run.
* In the 'General' tab of the train view, when a train is in the state 'Done'. This estimate directly uses the CPU usage of the train run.

## <a name="calculations"></a> How is it calculated?

Principle:

* There are a few studies for estimating CO2eq of computing, and there are many factors to consider such as the efficiency of the machines used and the power grid which the machines are on.
* Furthermore, grey energy (the energy to produce the machines) and the transfers of the data could be considered.
* It would be difficult to give justice to all these details and check for each train where exactly the jobs were running.
* Therefore, in Hyperloop, estimates of CO2eq are directly derived from CPU usage using an average conversion factor.

Calculation input:
* Power per computing core: We use an optimistic value of 10 W/core for pure power consumption and assume a 50/50 split between the carbon from power and carbon from embodied / embedded emissions, therefore working effectively with 20 W/core. Some literature estimates up to 53 W/core for power alone in high-performance computing (see https://www.nature.com/articles/s41550-020-1169-1).
* Electricity emission factors (kgCO2eq per kWh) vary depending how electricity is produced. However, carbon is not everything and we do not want to enter a debate on how energy should be produced here. Therefore we use an average value from https://arxiv.org/pdf/2011.02839.pdf of 0.301 kgCO2eq / kWh, noting that even for the same country the estimates vary significantly depending on the source (e.g. comparing to https://arxiv.org/pdf/2101.02049).
* Hyperloop estimations do not account for the power consumption of data transfer, central infrastructure, or power for storage. The paper 'electricity intensity of internet data transmission', https://onlinelibrary.wiley.com/doi/pdf/10.1111/jiec.12630, estimates 0.06kWh/GB in data transfer. This would mean that a petabyte of data transfer needs 18 tCO2eq. Additionally, the carbon produced by storing the data would not be negligible. For simplicity, we do not account for these aspects, so that our estimates are more directly linked to individual train runs, and not the wider Grid infrastructure.

* Power per computing core: We use an optimistic value of 10 W/core for pure power consumption and assume a 50/50 split between the carbon from power and carbon from embodied / embedded emissions, therefore working effectively with 20 W/core. Some literature estimates up to 53 W/core for power alone in high-performance computing (see <https://www.nature.com/articles/s41550-020-1169-1>).
* Electricity emission factors (kgCO2eq per kWh) vary depending how electricity is produced. However, carbon is not everything and we do not want to enter a debate on how energy should be produced here. Therefore we use an average value from <https://arxiv.org/pdf/2011.02839.pdf> of 0.301 kgCO2eq / kWh, noting that even for the same country the estimates vary significantly depending on the source (e.g. comparing to <https://arxiv.org/pdf/2101.02049>).
* Hyperloop estimations do not account for the power consumption of data transfer, central infrastructure, or power for storage. The paper 'electricity intensity of internet data transmission', <https://onlinelibrary.wiley.com/doi/pdf/10.1111/jiec.12630>, estimates 0.06kWh/GB in data transfer. This would mean that a petabyte of data transfer needs 18 tCO2eq. Additionally, the carbon produced by storing the data would not be negligible. For simplicity, we do not account for these aspects, so that our estimates are more directly linked to individual train runs, and not the wider Grid infrastructure.
* At 20W per core, and 0.301 kgCO2eq / kWh, this gives us: **6t CO2eq per 1MCPUh** or **1 CPU year = 53.3 kgCO2eq**
* In order to compare these emissions to something we know, we use CO2eq produced by flights based on curb6.com
* In order to compare these emissions to something we know, we use CO2eq produced by flights based on curb6.com
28 changes: 14 additions & 14 deletions docs/hyperloop/hyperlooppolicy.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,38 @@ title: Fair usage policy

The very large amount of data that will be collected in Run 3 represents a challenge for analysis, for both the CPU needs and the read data from storage, and therefore a resource usage policy has been put in place to ensure proper use of computing resources. The policy has been openly discussed in multiple meetings, including ALICE weeks, and is subject to adjustments as necessary and as the collaboration gains experience with the Run 3 analysis. If you have questions or doubts, please first refer to your PWG convener who will then bring up the case with the analysis coordinator.

The image below summarizes the policy:
The image below summarizes the policy:

<div align="center">
<img src="../images/hyperlooppolicy.png" width="80%">
<img src="../images/hyperlooppolicy.png" width="80%" alt="Screenshot of hyperlooppolicy">
</div>

In general, four categories of trains exist:
In general, four categories of trains exist:

* Trains below 30 TB and taking more than 2.0y of CPU time (red shaded area) are very strongly discouraged. In those cases, please resort to very small trains (where throughputs of even 100 KB/s are allowed with autosubmission) to run.
* Trains below 30 TB and taking more than 2.0y of CPU time (red shaded area) are very strongly discouraged. In those cases, please resort to very small trains (where throughputs of even 100 KB/s are allowed with autosubmission) to run.
* Trains that are lower than 2y in CPU usage and loop over less than 200 TB are free to execute and can be executed on Hyperloop via autosubmission. In a certain region between 30-200 TB, slightly more than 2y in CPU time is allowed (see sketch).
* Trains that loop over more than 200 TB and less than 800 TB are dealt with as follows:
* if they require less than 10 years of CPU time, they need only PWG convener approval.
* if they require more than 10 years of CPU time but less than 200 years, they need Analysis and Physics Coordinator approval to run.
* if they require over 200 years of CPU, they need excplicit PB approval.
* Heavy trains looping over datasets bigger than 800 TB are dealt with as follows:
* if they require less than 20 years of CPU time, they need only PWG approval.
* if they require between 20 to 200y of CPU, they can be approved offline by Analysis and Physics Coordination.
* if they require over 200 years of CPU, they need explicit PB approval.
* Trains that loop over more than 200 TB and less than 800 TB are dealt with as follows:
* if they require less than 10 years of CPU time, they need only PWG convener approval.
* if they require more than 10 years of CPU time but less than 200 years, they need Analysis and Physics Coordinator approval to run.
* if they require over 200 years of CPU, they need excplicit PB approval.
* Heavy trains looping over datasets bigger than 800 TB are dealt with as follows:
* if they require less than 20 years of CPU time, they need only PWG approval.
* if they require between 20 to 200y of CPU, they can be approved offline by Analysis and Physics Coordination.
* if they require over 200 years of CPU, they need explicit PB approval.

## <a name="implementation"></a>Implementation in Hyperloop datasets

In practice the chart above is mapped on a number of distinct resource groups which determine the limits assigned to each dataset:

<div align="center">
<img src="../images/resourcetable.png" width=800>
<img src="../images/resourcetable.png" width=800 alt="Screenshot of resourcetable">
</div>

The smaller the dataset size, the more often it is automatically submitted per week and the more often you are allowed to run on it per week. Manual requests to datasets above 50 TB are only fulfilled at the automatical submission times defined. This is in order to allow grouping of wagons to large trains.

## <a name="deriveddata"></a>Derived data

Derived datasets can be created on Hyperloop which are by construction much smaller than the original datasets. Those are advantagous because steps which are identical in each analysis train run (e.g. event selection and centrality calculation, secondary-vertex finding) are only executed once which saves CPU. Furthermore, as the size is smaller such trains cause less load on the storages.
Derived datasets can be created on Hyperloop which are by construction much smaller than the original datasets. Those are advantagous because steps which are identical in each analysis train run (e.g. event selection and centrality calculation, secondary-vertex finding) are only executed once which saves CPU. Furthermore, as the size is smaller such trains cause less load on the storages.

As an example, you can imagine that you run a derived data train on a dataset of 500 TB where you need explicit approval. Say you have a reduction factor of 100, then your output derived data is about 5 TB. You will be allowed to run on that dataset much more frequent, see the table above.

Expand Down
4 changes: 2 additions & 2 deletions docs/hyperloop/legoexpert.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ title: For the Run 2 LEGO train expert. What has changed?
* There is a history feature for wagons and datasets. You can access it by clicking on the button `📜` available inside of a wagon/dataset view. A detailed view of what has been created/updated/removed from the wagon/dataset is shown, as well as the username and the time when the change was made.

<div align="center">
<img src="../images/datasetHistory.png" width="100%">
<img src="../images/datasetHistory.png" width="100%" alt="Screenshot of dataset history">
</div>

* There are automated notifications. These notifications are created per user, and display changes made to tools, like _Datasets_, that are being used by the user. They are displayed per _Analysis_ in the _My Analyses_ page, or globally in the button `🔔` which can be found on the top menu.
Expand All @@ -31,5 +31,5 @@ title: For the Run 2 LEGO train expert. What has changed?
* **Performance Graphs** page allows the user to upload his own local metrics file, and then generate the test graphs specific to that file. You produce a local _performanceMetrics.json_ by running the o2 workflow with the argument _--resources-monitoring 2_ which, in this example, produces monitoring information every 2 seconds. These are the same type of graphs produced in the _Test Graphs_ tab of the train run. This page can be accessed at: <https://alimonitor.cern.ch/hyperloop/performance-graphs>.

<div align="center">
<img src="../images/performanceGraphs.png" width="100%">
<img src="../images/performanceGraphs.png" width="100%" alt="Screenshot of performance graphs">
</div>
36 changes: 17 additions & 19 deletions docs/hyperloop/notifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ title: Notifications
* The notifications can be seen in the _My Analyses_ page and in the _Notifications_ page, by clicking `🔔` in the menu bar.

<div align="center">
<img src="../images/notificationsMyAnalyses.png" width="90%">
<img src="../images/notificationsMyAnalyses.png" width="90%" alt="Screenshot of notifications in my analyses">
</div>

* The user can click the `✖️` button to remove a notification. In order to remove all the notifications, go to the Notifications page, and click the `❌`_Dismiss all_ button.

<div align="center">
<img src="../images/allNotifications.png" width="90%">
<img src="../images/allNotifications.png" width="90%" alt="Screenshot of all notifications">
</div>

## <a name="datasetChanged"></a>Dataset changed
Expand All @@ -27,33 +27,33 @@ title: Notifications
* The automatic composition settings have changed, e.g. the schedule

<div align="center">
<img src="../images/datasetChanged.png" width="90%">
<img src="../images/datasetChanged.png" width="90%" alt="Screenshot of dataset changed">
</div>

## <a name="datasetActivated"></a>Dataset activated / deactivated

* Notifies the user when a dataset included in his or her analyses has been successfully activated or deactivated.

<div align="center">
<img src="../images/datasetActivation.png" width="90%">
<img src="../images/datasetActivation.png" width="90%" alt="Screenshot of dataset activation">
</div>

## <a name="productionAdded"></a>Dataset production added or removed

* For RUN 3 data and MC, the user is informed if the production has been successfully added to or removed from the dataset.

<div align="center">
<img src="../images/productionAdded.png" width="90%">
<img src="../images/productionAdded.png" width="90%" alt="Screenshot of production added">
</div>

* For RUN 2 data, the user is notified when a conversion train run has been added to or removed from the dataset.

<div align="center">
<img src="../images/trainrunAdded.png" width="90%">
<img src="../images/trainrunAdded.png" width="90%" alt="Screenshot of trainrun added">
</div>

<div align="center">
<img src="../images/trainrunRemoved.png" width="90%">
<img src="../images/trainrunRemoved.png" width="90%" alt="Screenshot of trainrun removed">
</div>

* For derived data, a notification is sent when a Hyperloop train that produced derived data has been added or removed.
Expand All @@ -63,15 +63,15 @@ title: Notifications
* The user is informed when a run has been added to or removed from the DPG runlist. This change is usually done by the DPG experts.

<div align="center">
<img src="../images/runlistUpdated.png" width="90%">
<img src="../images/runlistUpdated.png" width="90%" alt="Screenshot of runlist updated">
</div>

## <a name="mergelistUpdate"></a>Mergelist updated

* The mergelist defines which runs are merged into one file at the end of the train running. The user is informed when a mergelist has been modified, added to or removed from the dataset production.

<div align="center">
<img src="../images/mergelistUpdate.png" width="90%">
<img src="../images/mergelistUpdate.png" width="90%" alt="Screenshot of mergelist update">
</div>

## <a name="linkedDataset"></a>Short datasets
Expand All @@ -89,55 +89,53 @@ Informs the user when a wagon has been disabled in different circumstances:
* Local tests are cleaned if the wagons are not submitted in a period of 4 weeks. The user is notified that the respective wagons are automatically disabled.

<div align="center">
<img src="../images/testCleaned.png" width="90%">
<img src="../images/testCleaned.png" width="90%" alt="Screenshot of test cleaned">
</div>

* When a wagon with derived data output is enabled, the test cannot start if the wagon and its dependencies share the same workflow. As a result, the wagon is disabled and the user is notified about the wagons which share the same task.

* The notification format is: The wagon _"wagon_name"_ was disabled in _"dataset_name"_. There is derived data. The following wagons have the same workflows {_wagon1_, _wagon2_: _common_workflow_},...,{_wagonX_, _wagonY_: _common_workflow_}

<div align="center">
<img src="../images/wagonDisabled1.png" width="90%">
<img src="../images/wagonDisabled1.png" width="90%" alt="Screenshot of wagon disabled 1">
</div>

* If among the wagon and its dependencies there are identical derived data outputs, the test cannot start, and the wagon is disabled.

* The notification format is: The wagon _"wagon_name"_ was disabled in _"dataset_name"_. The following wagons have the same derived data outputs {_wagon1_, _wagon2_: _common_derived_data_},...,{_wagonX_, _wagonY_: _common_derived_data_}

<div align="center">
<img src="../images/wagonDisabled.png" width="90%">
<img src="../images/wagonDisabled.png" width="90%" alt="Screenshot of wagon disabled">
</div>

* The wagon is disabled if the workflow name has been changed in the meantime. This is fixed by updating the workflow name in the wagon configuration.

<div align="center">
<img src="../images/notificationWorkflow.png" width="90%">
<img src="../images/notificationWorkflow.png" width="90%" alt="Screenshot of notification workflow">
</div>

* The wagon is disabled if one of the user defined dependencies of the wagon is considered identical to a service wagon. In order to most efficiently make use of the Grid and the analysis factilities, copies of core services are not permitted as it prevents combining several users into one train.

<div align="center">
<img src="../images/notificationIdenticalWagon.png" width="90%">
<img src="../images/notificationIdenticalWagon.png" width="90%" alt="Screenshot of notification of identical wagon">
</div>

A service wagon is considered identical to a user wagon if it shares the same activated output tables, the same workflow, and it has matching configurables. To fix this error, please use the listed service wagon as a dependency instead of the copy.



## <a name="inconsistentParameters"></a>Inconsistent parameters

* Hyperloop makes a comparison between the wagon configuration and the configuration defined in O2 for the package tag selected for the wagon. If they do not coincide, the user will be informed about the mismatch. The comparison is case sensitive, therefore a Configurable will not match if its name does not contain the identical lowercase / uppercase combination.

* The user is notified if there is a configurable present in the wagon configuration that is not defined in O2 for the selected package tag tag. Likewise, it informs the user when the wagon configuration misses one or more of the Configurables defined in O2 for the specific tag.

<div align="center">
<img src="../images/inconsistentParameters2.png" width="90%">
<img src="../images/inconsistentParameters2.png" width="90%" alt="Screenshot of inconsistent parameters 2">
</div>

* If the **wagon configuration is old**, and the wagon is enabled with the latest package tag, the user is advised to sync the wagon in order to get the present configuration. Following this, the test will start automatically. Likewise, the test is reset whenever there is a change in the database, such as updating or syncing the wagon configuration or its dependencies.

<div align="center">
<img src="../images/inconsistentParameters.png" width="90%">
<img src="../images/inconsistentParameters.png" width="90%" alt="Screenshot of inconsistent parameters">
</div>

* If the **wagon is enabled with an older tag**, the configuration might not match (hence the notification). If the old tag is needed, then syncing is not an option because this will set the package to the latest one. Therefore, the wagon configuration has to be modified as needed. The user can take as a reference _full_config.json_ in the test output, which shows the configuration the test is being run with, and compare it to the wagon configuration.
Loading
Loading