Fork me on GitHub


RHEEM is a scalable and easy-to-use system for cross-platform big data analytics. It provides an abstraction on top of existing data processing platforms. It allows users to easily specify their data analytics tasks with easy-to-use interfaces, provides developers with opportunities to optimize performance in different ways, and can run on any data processing platform, such as PostgreSQL, Spark, or Hadoop MapReduce. RHEEM abstraction is fully based on . . .



  • 09.03.2017:~rheem$  echo 'Performing ML on Rheem at SIGMOD'17'
  • 09.03.2017:~rheem$  echo 'Rheem is flying to the Spark Summit 2017'
  • 08.09.2016:~rheem$  echo 'Rheem 0.2.0 has been released!'
  • 10.07.2016:~rheem$  echo 'Get your hands on Rheem at BOSS@VLDB 2016'
  • 22.06.2016:~rheem$  echo 'Rheem is on the news!'
  • 13.06.2016:~rheem$  echo 'Rheem is free now!'
  • 01.06.2016:~rheem$  echo 'Rheem gets open source very soon'
  • 06.04.2016:~rheem$  echo 'We are looking for several dRHEEMers'
  • 30.03.2016:~rheem$  echo 'Rheem v0.1 is now available for download'
  • 27.02.2016:~rheem$  echo 'Rheem in action at SIGMOD'16'
  • 01.01.2016:~rheem$  echo 'EDBT'16 got a Rheem Vision'
  • 01.10.2015:~rheem$  echo 'Rheem will deal with inequality joins at VLDB'16'
  • all:~rheem$ show all



Run a single analytic task on top of multiple data processing platforms to boost performance.
Read More


Novel optimization techniques to significantly boost the performance of your applications.
Read More


User defined functions (UDFs) as first-class citizens, enabling extensibility and adaptability.
Read More


A simple Java interface that allows developers to focus only on their application logics.
Read More