Big Data Spain is an annual conference on Big Data and related topics held in the suburbs of Madrid. This year’s, i.e. third, edition has so far been the biggest; it has attracted more than 500 guests and various speakers including Big Data celebrities like Paco Nathan of Databricks. During two days of the conference, guests could attend many keynotes, speeches and workshops and learn about variuos products, services and specific use-cases, in both English and Spanish. Allegro was represented by two employees with a presentation on Hadoop pitfalls and gotchas.
“State of Play”
Sean Owen of Cloudera and Paco Nathan, both experienced engineers and data scientists, shared their views on the current state of the art as well as the past and future of computing. Paco Nathan devoted his speech to past and current “turning points” like abandoning “classic” SQL databases, the golden era of functional programming, algebra and discrete mathematics in now omnipresent distributed environments, the rise of so-called “notebooks”: collaborative, cloud-based tools focused on developing processes, cloud computing itself (including the new “containerization” trend).
Sean Owen focused on how to process and analyze data on a large scale, how the worlds of analysts and engineers differ and why they should meet in order to best leverage their data. Two, so far very distant areas start to finally converge thanks to recent developments which fill the gap between them. It is worth noting that both lecturers consider MapReduce obsolete and advertise Apache Spark as the future of Big Data.
Allegro Group employees Jarosław Grabowski and Jacek Juraszek shared their experience with working with Hadoop ecosystem. Their talk titled “Pitfalls of storing and processing data in Hadoop” included a variety of practical “dos and don’ts” presented side by side with multiple Big Data aspects: data ingestion, processing, monitoring and storage. The presentation was well received by the audience and started many discussions in the lobby.
An Amazon representative showed the advantages of using their cloud solutions, while Toby Woolfe of IBM gave a clear, systematic talk about collaboration between Big Blue and General Motors in terms of data mining. He also showed how Big Data tools and technologies meet traditional, non-IT business and help to earn what every company appreciates: money. An interesting use-case of Big Data analysis helping to track down car defects was presented.
Stratio, the third main sponsor was omnipresent. Stratio’s CEO Óscar Méndez played the role of the host of the whole event while Stratio engineers showed two Spark-based products, Stratio Streaming and Stratio Crossdata, in two separate workshops in Spanish.
Dr Jim Webber introduced Neo4J, a graph-oriented database management system. During an excellent and humorous presentation and workshop, English researcher showed why a graph-oriented database is an excellent choice for certain classes of problems and sketched a quick, but impressive example of retail recommendations. Another fun talk was given by Jordan Tigani of Google. He presented Google Big Data toolset and used it for Machine Learning on a football (“soccer”, as he called it) example: predicting the results of football matches. Tigani conducted the presentation using a “notebook”, confirming Paco Nathan’s point on this emerging tool. Make sure you take a look at IPython, an implementation of this idea.
Cloudera’s Enrico Berti introduced a popular Hadoop web interface, i.e. Hue. Enrico gave a quick presentation of the key features of Hue and then showed how Hue converts otherwise cumbersome Hadoop interaction into a series of quick and easy steps, while his coworker Gwen Shapira completed the topic of pitfalls in her technical talk on benchmarking issues.
The conference, although well organized in a spacious multiplex cinema, is rather a medium-sized event. While it still can’t compete with giants like Strata, it is certainly worth looking at next year. All the presentations are available on Big Data Spain YouTube channel.