YetAnotherForum
Welcome Guest Search | Active Topics | Log In | Register

Could PI play a part in LHC experiment? Options · View
RJK Solutions
#1 Posted : Wednesday, September 10, 2008 11:34:21 AM
Rank: Administration

Groups: Administration

Joined: 6/20/2008
Posts: 612
Location: Cheshire, United Kingdom.

As the LHC experiment has now sent its first beam of protons along it's 27km long tunnel, my mind began to wonder on specifics of the data being collected and analysed. Then being such a PI enthusiast I began wondering on how PI could be used in analysis & data storage - in fact even if PI could cope with the sheer volume of data being generated at such a fast rate.

CERN (European Organisation for Nuclear Research) who are running the experiment have developed their own open source analysis software called ROOT: http://root.cern.ch/root/Mission.html However, providing the data is transported to a series of PI systems, there is no obvious reason why the data could not be analysed from PI's existing tool set or custom analysis "bolt-ons".

To put into context the amount of data being collected.
Every second, 600 million particle collisions are measured, and LHC scientists will filter out around a thousand or so measurements that are interesting. The electronic ‘photo’ of each event requires 1 to 2Mb of storage. When running at full capacity, the LHC will run between 150 and 200 days per year, leading to a rough range of between 10 and 20 petabytes (1 petabyte = 1 million gigabytes) of data per year. This is a lot of data - could a PI system store this much data??

For a clearer picture of how PI could be used we would need to understand how the LCH Computing Grid works. (Grid Computing)
Quote:
Because of these limits, CERN cannot cope with all the data coming from the experiments. Thus, it has set up the LCG (LHC Computing Grid) Project to build and maintain a data-storage and analysis infrastructure for the entire high-energy physics community who work with the LHC. It will give roughly 15,000 scientists in some 500 research institutes and universities worldwide access to experimental data. Further, data must be available over the 15-year estimated lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires of the order of 100,000 CPUs at 2006 measures of processing power.

CERN chose a data grid model because it provides several key benefits. For one, the significant costs of maintaining and upgrading the necessary resources for such a computing challenge are more easily handled in a distributed environment. Also, there are fewer single points of failure.

A distributed system also presents significant challenges which include ensuring adequate levels of network bandwidth, maintaining coherence of software versions at various locations, coping with heterogeneous hardware, managing and protecting data so it is not lost or corrupted, and providing accounting mechanisms so that different groups have fair access.

CERN has dubbed itself as the Tier0 computing facility, and it sends all the data to Tier1 facilities, of which there are roughly a dozen such as Fermilab and the Grid Computing Center in Karlsruhe, Germany. A requirement for the Tier0 and all Tier1 facilities is that they must be available 24 hours a day, seven days a week. A full copy of all the experimental data is spread across all the Tier1 facilities.

Tier1 centres make subsets of data available to Tier2 centres, each consisting of one or several collaborating computing facilities that can store sufficient data and provide adequate computing power for specific analysis tasks. There are approximately 200 to 300 Tier2 centres.

Finally, individual scientists can access even smaller subsets of data from particular experiments through Tier3 computing resources, which can consist of local clusters in a university computer centre or even individual PCs. The Tier3 centres are just now being set up, so there are no hard numbers of the users. But a rough guess of all the computing nodes that will be used from Tier0 through Tier3 is in the order of 30,000.


A series of PI systems sitting on a Tier2 or Tier3 centre could provide data storage for specific analysis tasks or visualisation of data. Standard tools could be used for basic analysis (trends in ProcessBook, custom analysis rules in PI-AF etc) of data or even for students. More advanced analysis software could be used with the data from PI to provide further analysis or even create simulations based on the time series data that the PI systems have collected. Imagine a simulator plug-in for ProcessBook that will simulate a model based on the timerange of a display in 2D/3D.

I look forward to the day that PI is used in such an experiment or within a system that collects huge amounts of data and in depth analysis is performed with visualisation/simulations direct from PI.

So what are your thoughts on this topic?
Do you think the PI system could play a part?
How would you use PI in this scenario?


Principal Consultant
Real-Time Data Management @ Wipro Technologies
Sponsor  
 

OSIsoft vCampus is a subscription-based, online offering that consists of providing everything people need to develop applications on the PI System.
We invite you to take a "tour" of the OSIsoft Virtual Campus - also feel free to consult the FAQ  or contact OSIsoft vCampus for more details.
Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.