“A journey of the unexpected: Big data, humanities data” by Oohmm
A journey of the unexpected: Big data, humanities data
Teacher: Renato Gabriele — Oohmm
A journey through years of advancement in the big data analysis related with the computational propaganda, crisis, disaster, social events or general elections, researching for new models, analysis covering data and metadata if “polluted”, by multiple kind of interests and sources.
Renato Gabriele will talk about a 5-year research project on social and web metadata, following the required innovation to connect data analysis with digital humanities to better understand complex big data.
We’ll end with an open discussion on the matrix of usual mistakes in approaching critical data analysis, and related unexpected effects.
#bigdata #propaganda #research #digitalhumanities
“Data Visualization with D3.js” by TODO
Data Visualization with D3.js
Teacher: Fabio Franchino — TODO
Immersive lecture on the key elements and concepts behind data visualization.
Duration: 1.5 days.
“Crash course in Python and data science libraries” by TOP-IX
Crash course in Python and data science libraries
Teacher: Stefania Delprete — TOP-IX
Interactive lessons using Jupyter Notebooks on Python and its most used libraries for data science: NumPy, Pandas, Matplotlib, and an initial Scikit-learn exposure. Plus you’ll get clear on what’s inside the Anaconda and SciPy ecosystems.
This session will include insights of the history and future of the open source libraries, how to contribute and participate to the community events.
Stickers for all the participants provided by Python Software Foundation and NumFOCUS.
Duration: 1.5 days.
Prerequisites: Exposure to Python and Jupyter Notebooks.
#datascience #python #numpy #pandas #matplotlib #scipy #scikitlearn
“Real Time Ingestion and Analysis of data with MongoDB and Python” by AXANT
Real Time Ingestion and Analysis of data with MongoDB and Python
Teacher: Alessandro Molina — AXANT
Nowadays more and more data is generated by companies and software products, especially in the IoT world records are saved with a throughput of thousands per second.
That requires solutions able to scale writes and perform real time cleanup and analysis of thousands of records per second and MongoDB is getting wildly used in those
environments in the role of what’s commonly named “speed layers” to perform fast analytics over the most recent data and adapt or cleanup incoming records.
This session aims to show how MongoDB can be used as a primary storage for your data, scaling it to thousand of records and thousand of writes per second while also acting as a real-time analysis and visualization channel thanks to change streams and as a flexible analytics tool thanks to the aggregation pipeline and MapReduce.
Duration: 1 day.
#mongodb #realtime #scaling #mapreduce
“Data Analysis with Spark Streaming” by Agile Lab
Data Analysis with Spark Streaming
Teacher: Nicolò Bidotti — AgileLab
Big Data analysis is a hot trend and one of its major roles is to give new value to enterprise data. However data and information lose value as they become old, so it is important in a lot of contexts to do near real-time analysis of incoming data flows. Apache Spark is a major actor in the big data scenario and with its Streaming module aims to solve the main challenges in real-time data processing at scale in distributed environments.
This session aims to show the potential of streaming data analysis and how to leverage on Apache Spark with Structured Streaming to extract value from it without taking care of common problems of streaming processing at scale already solved by Apache Spark.
Duration: 2 days.
#bigdata #dataengineering #dataframework #apachespark
“Excursus: Agent-based modelling and synthetic populations” by GCF
Excursus: Agent-based modelling and synthetic populations
Teachers: Sarah Wolf, Andreas Geiges — Global Climate Forum
To understand possible transitions of complex systems (like e.g.societies, markets, systems of socio-technical co-evolution) pure data analysis might not be sufficient because such transitions often imply substantial shifts that can hardly be described by pure statistical data extrapolation. Therefore, modelling activities can be a useful complement to data analysis.
This workshop introduces an agent-based model, which is based on synthetic populations, for the global challenge of how to make mobility more sustainable. It illustrates the methodological approach of agent-based modelling, discusses how the process of model development can be accompanied with stakeholder dialogues, explores the interaction between such an agent-based model and the relevant data science tools, and provides some hands-on exercises.
Duration: 2 days.
Prerequisites: basic knowledge of Python
#datascience #complexsystems #agentbased #mobility #sustainability
“Data Citizenship and NetScience: technology for data-culture” by HER
Data Citizenship and NetScience: technology for data-culture
Teachers: Salvatore Iaconesi, Oriana Persico — Human Ecosystems Relazioni
We constantly generate data, whether we realize it or not, whether we want it or not, and a very limited number of subjects has access to all of this data. This is a very serious condition, with enormous implications for our fundamental rights and freedoms, and for our opportunities to prosper, create, express, relate and live a just, inclusive, constructive life.
In this session we explore technologies for cultural acceleration through data: Human Ecosystems to create large scale, participatory data collection processes; Ubiquitous Commons for distributed, blockchain supported data-rights and evolved data-ownership patterns; Generative Open Data as accessibility layer for shared data commons.
This is a hands on session in which profound theoretical concepts emerge from technological architectures themselves and through the ways in which we will use them. It will be mainly focused on Network Science and the ways in which we can use it to gain better understandings of the city’s Relational Ecosystem between people, organizations, network connected objects, sensors and more.
We will see and understand how to use the platforms, and explore a practical case study: Bologna’s TDays, the limited traffic week-ends in the historical center of Bologna. We will figure out together how possible ways in which to transform them into a data-driven, inclusive, engaging opportunity for participatory citizenship, by using the platforms, social networks, art and design.
Duration: 1 day.
#networkscience #socialscience #territory #city #citizenship
“Data matching and deduplication with Python” by Oval Money
Data matching and deduplication with Python
Teacher: Simone Marzola — Oval Money
In the era of multi-tiered big-data infrastructures, data is commonly spread in multiple datasources and duplicates are everywhere. As a data scientist you’ll need to focus on consolidation of data to improve the data quality and build comprehensive data assets, through a process called data deduplication.
This sessions aims to show how data analysis tools for Python, like Pandas and NumPy, can be used to solve the deduplication problem in very large datasets. The proposed method includes data preprocessing and cleaning, comparison, indexing and classification.
We will use an anonymized subset of Oval Money user transactions to match duplicates and detect recurring transactions.
Duration: Half day.
Prerequisites: Python, Pandas, NumPy.
#bigdata #deduplication #classification #finance
“Data Visualization using the open source KNOWAGE suite” by KNOWAGE
Data Visualization using the open source KNOWAGE suite
Teachers: Isabella Iennaco, Paolo Raineri — KNOWAGE (Engineering S.p.A)
Business analytics lecture on a real KNOWAGE use case of predictive maintenance with an Open Source full stack!
The lecture is an interesting journey around KNOWAGE data visualization and data discovery capabilities and how they work in practice. The teacher will guide you towards a comprehensive understanding of KNOWAGE suite and allowing you to explore a large Industry 4.0 business project.
Duration: Half day.
#mongodb #realtime #scaling #mapreduce #predictivemainteinance
“Machine Learning and Deep Learning for Computer Vision” by ISI
Machine Learning and Deep Learning for Computer Vision
Teachers: Andrè Panisson, Alan Perotti — ISI Foundation
This in-depth part of the course allows to build an appealing and diversified Machine Learning portfolio. It starts with a Machine Learning introduction and application with Scikit-learn, and continues with Neural Networks and backpropagation lectures where you’ll start exploring Computer Vision techniques on a dataset of images.
Deep Learning methods. You’ll be challenged to use TensorFlow and Keras on a image classification real cases (such as distracted drivers, healthcare or plant diseases). The workshop ends with lessons in Transfer Learning and one last project building your data set by scraping Google images and practicing everything you learned.
Duration: 3.5 days.
Prerequisites: Python, Pandas, Statistics, exposure to Machine Learning is welcome.
#machinelearning #deeplearning #neuralnetworks #scikitlearn #tensorflow
“Voice Recognition models in DeepSpeech and Common Voice” by Mozilla
Voice Recognition models in DeepSpeech and Common Voice
Teacher: Alexandre Lissy — Mozilla
DeepSpeech is an open source Speech-To-Text engine, using model trained by machine learning techniques, based on Baidu’s Deep Speech research paper.
You will learn how the model works, and how this was implemented using TensorFlow. The workshop will cover how we went from a PoC hack to a model that we try and make usable in production and how we leverage the distributed training system. We’ll explore how the inference-specific model is being built and the code around to make it run on several devices, and the tooling from TensorFlow we explored to try and speedup things.
We also present the Common Voice project, aiming at collecting open dataset for machine learning and more specifically voice-targetted machine learning.
You’ll be able to contribute to both project: how to train your own model for DeepSpeech, how to use DeepSpeech as a “blackbox”, how to hack into DeepSpeech, and how to contribute to Common Voice.
Duration: Half day.
Prerequisites: Python, shell, exposure to C++ is welcome.
#machinelearning #deepspeech #voicerecognition #tensorflow
“From local to glocal using community data” by FBK
From local to glocal using community data
Teacher: Maurizio Napolitano — Fondazione Bruno Kessler
The workshop starts with an introduction to the GIS world, the geospatial protocols and the available geodata resources.
It continues diving in the OpenStreetMap ecosystem where we explore how it can be used as a great tool for data scientists. After the examples of analysis on real cases, you’ll be challenged to make your own geospatial project supervised by the expert Maurizio Napolitano.
Duration: 1 day.
Prerequisites: Python, previous experience with OpenSteetMap is welcome.
#geospatial #map #opendata #osm