Re: [thredds] TDS as a big data platform

  • To: Guan Wang <gwang@xxxxxxx>
  • Subject: Re: [thredds] TDS as a big data platform
  • From: "Antonio S. Cofino" <cofinoa@xxxxxxxxx>
  • Date: Fri, 26 Feb 2016 18:52:17 +0100
Hi Guan,

The 10G ethernet provides access to the rest of nodes which only have 
gigaethernet conection. 

The IB is configured as IPoIB

The current number of tomcat instances is 4, but when some tutorial or workshop 
is carry on the number is increased to 8. The basic idea is to have some 
load-balancing and fallback workers. This last is because some TDS could stuck 
or stop to work. 

The hard part is monitor and manage the whole swarm of components. 

Antonio


> El 26 feb 2016, a las 17:39, hGuan Wang <gwang@xxxxxxx> escribió:
> 
> Hi Antonio,
> 
> Thank you for sharing! This is really helpful! I have to say these were fancy 
> hardware back to 2010 :)
> 
> Two questions:
> 
> 1. Since you have IB through fabric, why 10G Ethernet is still necessary on 
> storage server?
> 
> 2. How many tomcat worker servers/VMs do you have?
> 
> Guan
> 
> ----- Original Message -----
> From: "Antonio S. Cofiño" <cofinoa@xxxxxxxxx>
> To: "Guan Wang" <gwang@xxxxxxx>
> Cc: "y kudo" <y_kudo@xxxxxxxxxxxx>, thredds@xxxxxxxxxxxxxxxx
> Sent: Friday, February 26, 2016 10:51:17 AM
> Subject: Re: [thredds] TDS as a big data platform
> 
> The current configuration, is based on very old hardware (2010) and features: 
> 
> 
>    * Web frontend is Apache httpd 2.2 (reverse proxy+SSL): Virtual server 
>    * web backend balanced based on mod_jk (AJP)+Apache httpd 2.2: Virtual 
> server 
>    * Tomcat 7 workers (TDS deployment): 
> 
> 
>        * CPU: 2 Intel(R) Xeon(R) E5620 (4 Cores, 12M Cache, 2.40 GHz, 5.86 
> GT/s Intel® QPI); 
>        * RAM: 16GB; OS: CentOS6 
>        * Storage Area Network: Infiniband DDR (20Gb/s) 
>    * Storage server (2010): 
> 
>        * CPU: 2x Intel® Xeon® Processor E5520 (8M Cache, 2.26 GHz, 5.86 GT/s 
> Intel® QPI) 
>        * RAM: 24 GB RAM 
>        * Hard Disk: 190 HDD (2TB and 3TB drives) (SATA 6GB/s) 
>        * HBA: 4xLSI 9200-8e SAS HBA 
>        * Network: 10G Ethernet+Infiniband DDR (20Gb/s) 
>        * OS: OpenIndiana oi_151a6 
>        * Filesystem: ZFS (raidz2 vdevs 10+2) 
>        * RAW Storage pool: 402TB 
> 
> Please let me know if you want more details. 
> 
> Antonio 
> 
> 
> 
> 
> 
> 
> El 26/02/2016 a las 16:13, Guan Wang escribió: 
> 
> 
> Hi Antonio,
> 
> Thank you for sharing! Do you mind also share the server config, cpu, ram 
> etc. that runs TDS?
> 
> Guan
> 
> ----- Original Message -----
> From: "Antonio S. Cofiño" <cofinoa@xxxxxxxxx> To: thredds@xxxxxxxxxxxxxxxx 
> Cc: "y kudo" <y_kudo@xxxxxxxxxxxx> Sent: Friday, February 26, 2016 8:52:33 AM
> Subject: Re: [thredds] TDS as a big data platform
> 
> 
> Yoshi,
> 
> Below my expertise on TDS (v4.3 and v4.6)
> 
> 
> 
> El 19/02/2016 a las 7:26, Yoshiyuki Kudo escribió: 
> 
> Hi,
> 
> I am in a project where bunch of EO data researchers will use some data 
> access services for an attempt to create new data products out of the wealth 
> of the data pool.  The data will be EO data (coverage data) in netCDF, some 
> GBytes per data granule, and will amount to over 120TB, 0.3 million data 
> files in total (1 year worth of collection).
> 
> I feel TDS or Hyrax can be a good candidate for this platform, but would like 
> to hear your advice before further estimation of work and hardware purchase.  
> I very much appreciate your expertise on this.
> 
> 1) I see some historical threads about how aggregation of large volumes of 
> data can be problematic.  I will need to consider the aggregation as well, 
> but is the 100TB+ aggregation possible ? Both technically and performance 
> wise ? We have an operational service  which aggregate collections of 
> datasets. 
> One of the aggregations consist in 135k files in GRIB1 format and 13TB 
> of data. Another collection is based on 300k+ files but 8TB on size. 
> This collections are aggregated in just one NetCDF entity using a NCML, 
> each one. The 100TB+ of aggregation will be possible, but the limit will 
> be the performance because the amount of files. 
> 
> 2) Is there any HW restriction for the TDS set up I should have in mind 
> before preparing the HW ?  Do I need to have a single disk drive (partition) 
> for the 100+TB data management in TDS ? No, you don't need to have just one 
> partition. But In our case we have  
> 400TB of disk based in ZFS (OpenIndiana) using a pool of 150 desktop 
> HDDs, using a configuration of raidz2 vdev (10+2 disks). For TDS 
> services we are using a load-balanced configuration with TDS instances 
> running in a cluster. 
> 
> 3) Could you share any success story you know of, about handling large 
> volumes of data in a TDS ? 
> https://rd-alliance.org/sites/default/files/attachment/20150924_Day2_1330_End-userGatewayForClimateServicesAndDataInitiatives_Cofino.pdf
>  
> 
> 4) Any other recommendation or things I need to keep in mind ? We considered, 
> at the beginning, dynamic aggregation based on scan 
> directory facilities provided by TDS, but at the end it didn't perform 
> well, and what are we doing is generate static ncml aggregations. 
> 
> Thank you so much for your support. Please feel free to ask.
> 
> Regards
> 
> Antonio
> 
> --
> Antonio S. Cofiño
> Grupo de Meteorología de Santander
> Dep. de Matemática Aplicada y
>         Ciencias de la Computación
> Universidad de Cantabria http://www.meteo.unican.es 
> 
> Yoshi
> 
> _______________________________________________
> thredds mailing list thredds@xxxxxxxxxxxxxxxx For list information or to 
> unsubscribe,  visit: http://www.unidata.ucar.edu/mailing_lists/ 
> _______________________________________________
> thredds mailing list thredds@xxxxxxxxxxxxxxxx For list information or to 
> unsubscribe,  visit: http://www.unidata.ucar.edu/mailing_lists/ 



  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: