Dagstuhl Seminar 18481: High Throughput Connectomics

Dagstuhl Seminar 18481

High Throughput Connectomics

( Nov 25 – Nov 30, 2018 )

(Click in the middle of the image to enlarge)

Permalink

Please use the following short url to reference this page: https://www.dagstuhl.de/18481

Organizers

Moritz Helmstaedter (MPI for Brain Research - Frankfurt am Main, DE)
Jeff Lichtman (Harvard University - Cambridge, US)
Nir Shavit (MIT - Cambridge, US)

Contact

Michael Gerke (for scientific matters)
Simone Schilke (for administrative matters)

Publications

High Throughput Connectomics (Dagstuhl Seminar 18481). Moritz Helmstaedter, Jeff Lichtman, and Nir Shavit. In Dagstuhl Reports, Volume 8, Issue 11, pp. 112-138, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

Motivation

Show Motivation

Modern connectomics, the mapping of the connectivity of neural tissue at synaptic resolution, produces ‘big data’ that must be analyzed at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions. This Dagstuhl Seminar will bring together key researchers in the field in order to understand the problems at hand and provide new approaches towards the design of high throughput systems for mapping the micro-connectivity of the brain.

The massive amounts of storage and computation in connectomics pipelines require expertise not only in computational neurobiology, machine learning, and alignment techniques, but also in parallel computation, distributed systems, and storage systems. Our aim is to bring researchers from all these areas. Our goal will be to both build an understanding of the state of the art in high-throughput connectomics pipelines, and to brainstorm on how to move the field forward so that high throughput connectomics systems become widely available to neurobiology labs around the world.

Concretely, we would like to come out of this seminar with a hierarchical plan for future connectomics systems that solve existing systems’ problems. We will begin the seminar by having workgroups discuss these problems in existing systems and then dedicate the latter part to collectively working out solutions. We will consider three levels:

The system layer: how data is stored, moved around and computed on in a distributed and parallel fashion.
The pipeline layer: how processing progresses from stitching through alignment and reconstruction.
The algorithm layer: the specific machine learning and error detection and correction algorithms used in various pipeline stages to bring the datasets to analyzable connectivity graphs.

Each day will consist of two lecture sessions in which several colleagues will present their work surveying the current state of the art and problems still in need of solutions. Each such session will be followed by a break into several workgroups on topics that were agreed as being contentious or in need of improved solutions. Our plan is to spend the majority of the time in workgroup discussions and a lesser fraction of it in the lecture hall. As is the tradition, there will also be ample time for discussions over beer in the late afternoons and eves.

Our hope is to conclude the seminar with a coherent plan on how the field should proceed in coming years.

Creative Commons BY 3.0 DE

Moritz Helmstaedter, Jeff Lichtman, and Nir Shavit

Summary

Show Summary

Our workshop brought together experts in the computational aspects of connectomics. A week of lectures and work-group meetings in a lively and collegial environment led to a collection of interesting conclusions. One big idea that was put forth in the meeting was the gargantuan effort of reconstructing a complete mouse brain. Another was to completely map the white matter connectivity of a mammalian brain. We also discussed which techniques/pipelines we should continue to pursue as a community. In that vein one big conclusion was that you have to have both the engineers and software working on a pipeline; distributing software only is not sufficient (you need dedicated engineers to run the software, it can't be based just on grad students). Zeiss reported on a multibeam 331 beam microscope that was in the making. There were also discussions on quality measures and metrics for connectomics reconstruction, and on developing standardized datasets for segmentation training and comparison of algorithms (scaling up from current day small datasets). Finally, there were discussions on the ethics and policies in the area going forward -- Should we rely more on industrial partners to provide compute power and storage, or is it better to keep most of the research in universities and non-for-profit research institutes.

Introduction

The sheer complexity of the brain means that sooner or later the data describing brains must transition from something that is rather easily managed to something far less tractable. This transition appears to now be underway. The accumulation of ever-bigger brain data is a byproduct of the development of a number of new technologies that provide digitized information about the structural organization (anatomy) and the function of neural tissue. These new collection approaches bring novel data into neuroscience that potentially bears on many poorly understood aspects of the nervous system. Fundamental scientific questions such as the way learned information is instantiated in the brain and how brains change over the course of development and aging are likely to be usefully addressed in the coming decades as large data sets mapping networks of neurons at high resolution become available.

Mapping networks of neurons at the level of synaptic connections, a field called connectomics, began in the 1970s with a the study of the small nervous system of a worm and has recently garnered general interest thanks to technical and computational advances that automate the collection of electron-microscopy data and offer the possibility of mapping even large mammalian brains. However, modern connectomics produces `big data', unprecedented quantities of digital information at unprecedented rates, and will require, as with genomics at the time, breakthrough algorithmic and computational solutions.

Unfortunately the generation of large data sets is actually the easy part. Our experience in the nascent field of connectomics indicates that there are many challenges associated with the steps after data acquisition, that is, the process of turning the data into a mineable commodity. This workshop will focus on addressing these challenges by bringing together researchers developing algorithms and deploying software systems that enable high-throughput analysis of connectomic data.

While high-throughput connectomics must tackle many of the problems that occur in big data science and engineering, tremendous differences in data size, computational complexity, and the problem domain will require novel computational frameworks, algorithms, and systems. Input image data in connectomics is reaching, even in its initial stages, petabytes in size at a terabytes-per-hour rate, and currently requires millions of cycles of computation per pixel. Such data is not easily moved or stored, and so on-the-fly analysis of the data as it comes off the microscope is the most likely future solution. Achieving the kind of throughput that will allow us to process the data at the rate at which it is being generated necessitates a three orders of magnitude reduction in cycles per pixel, compared to the status quo. Furthermore, there is locality to the data. Unlike other big data problems, which can often be represented as independent key-value pairs spread across many machines, reconstruction of neural circuits requires frequent data exchanges across adjacent image regions. Buffering all the data in machine memory is infeasible, as is data replication on multiple servers. That means one cannot rely on Moore's law and parallelism across data centers to solve this problem--we need to be smarter.

In a nutshell, a connectomics data set is a collection of images taken on a volume of brain tissue that has been sectioned into many thousands of small slices, each only a few tens of nanometers thick. These slices are then imaged using custom electron microscopes to produce an image stack that will in the near future reach petabytes in size. Using one of the standard electron microscopy pipeline approaches, the key computational problems that must be addressed in order to turn the raw acquired digitized images into a useful form of ``onnectivity graph'' are stitching, alignment, neuron reconstruction, and synapse detection. Each digitized image tile needs to be stitched together with neighboring tiles to form a composite image of a slice. Then, the stitched slice image is aligned with the previous and subsequent slice images. Despite being mostly similar, image alignment is challenging because typically a conveyor belt collects the slices and each may rotate a few degrees, or stretch depending on its thickness. Fortunately, because of the high image resolution, alignment is practical, as axons and dendrites are readily visible in cross-section and can be traced from one section to the next. A second challenge is that, once the image data is aligned, the sectioned objects must be individuated. In these data sets, the objects are neurons and other cellular entities that are interwoven in the three-dimensional space of the sample tissue. The reconstruction of neural processes as they pass from one section to the next is directly related to the computer vision problem of obtaining a segmentation of an image series, that is, the labeling of pixels in the images according to which cell they belong to.

Although considerable progress has been achieved in computer-based image segmentation in the last few years, reliable automatic image segmentation is still an open problem. Automating the segmentation of connectomic data is challenging because the shapes of neural objects are irregular, branching, non-repeating and intertwined. Moreover, the actual number of different objects and their synaptic interconnections in a volume of brain tissue is unknown and, at the moment, even difficult to estimate or bound. Segmentation of a standard electron microscopy image is further complicated by the fact that the range of pixel intensity values of cell membranes overlaps with that of other organelles. Thus, simple thresholding to find cell boundaries does not work.

In the eyes of many, the term big data is synonymous with the storage and analysis of massive collections of digital information. The ``big'' refers to the size of the input sets, typically ranging in the tens or even hundreds of terabytes, and arriving at rates of several tens or hundreds of gigabytes per second. In connectomics, the size of the input set is at the high end of the big data range, and possibly among the largest data ever acquired. Images at several nanometers resolution are needed to accurately reconstruct the very fine axons, dendrites, and synaptic connections. At this resolution, a cubic mm is about 2 petabytes of data. A complete rat cortex including some white matter might require 500 cubic mm and thus would produce about an exabyte (1000 petabytes) of data. This amount is far beyond the scope of storage that can be handled by any system today (as a reference point, consider that Walmart or Aldi's database systems manage a few petabytes of data). A complete human cortex, 1000-times that of a rodent, will require a zetabyte (1000 exabytes) of data, an amount of data approaching that of all the information recorded in the world today. Obviously this means that the goal of connectomics will not be to acquire complete human brains and that for the near future one must consider reconstructions of neuronal substructures as opposed to whole brains. Moreover, it is clear that as we go beyond a few millimeters, one cannot store the raw data: it must be analyzed on the fly as it comes off the microscope and then discarded, keeping the physical tissue sample for re-imaging if needed.

What is this on-the-fly acquisition rate? The new multi-beam electron microscopes currently produced by Carl Zeiss LLC have a staggering throughput approaching 400 sections per day or a terabyte of data per hour, placing them at the far end of the big data rate spectrum. This rate, if it can be matched with appropriate reconstruction algorithms, will allow researchers to process a cubic mm of rodent brain, that is, 2 petabytes of data, in about 6 months operating 24 hours a day, 7 days a week. Whatever computational pipeline is used to extract the connectomics graph from the image data, it will eventually have to work on the fly, at the pace of the microscope that generates this data.

The algorithms and computational techniques for developing such high throughput connectomics pipelines are the target of this workshop. The massive amounts of storage and computation require expertise not only in computational neurobiology, machine learning, and alignment techniques, but also in parallel computation, distributed systems, and storage systems. There are several groups of researchers around the world that specialize in collecting the electron microscopy datasets, and several that engage in developing matching computational pipelines. Our aim is to bring these researchers together for an extended 5-day brainstorming session. We will also invite some top researchers in related fields such as machine learning, computer vision, distributed systems, and parallel computing. Our goal for this meeting is to both build an understanding of the state of the art in high-throughput connectomics pipelines, and to brainstorm on how to move the field forward so that high throughput connectomics systems become widely available to neurobiology labs around the world.

Concretely, we would like to come out of this workshop with a hierarchical plan for future connectomics systems that solve existing systems' problems. We will begin the workshop by having workgroups discuss these problems in existing systems and then dedicate the latter part to collectively working out solutions. We will consider three levels:

The system layer: how data is stored, moved around and computed on in a distributed and parallel fashion.
The pipeline layer: how processing progresses from stitching through alignment and reconstruction.
The algorithm layer: the specific machine learning and error detection and correction algorithms used in various pipeline stages to bring the datasets to analyzable connectivity graphs.

Our plan is to discuss each of these in detail, with the hope of concluding the workshop with a coherent plan on how to proceed.

Relation to previous Dagstuhl seminars

To the best of our knowledge there have been no similar Dagstuhl seminars in the past. The field of connectomics is a young cutting edge big data research area that will have important implications on both computation in the sciences (and in particular on the use of large scale machine learning in the sciences) and on artificial intelligence (through the development of new neural network models based on the neurobiological discoveries this research may lead to). We believe it is important for modern computer science to engage in such interdisciplinary applications of computing and algorithms and we are therefore eager to initiate this new seminar direction.

Creative Commons BY 3.0 Unported license

Nir Shavit

Participants

Show Participants

Daniel R. Berger (Harvard University - Cambridge, US) [dblp]
Davi Bock (Howard Hughes Medical Institute - Ashburn, US) [dblp]
Kevin Briggman (Max-Planck-Gesellschaft - Bonn, DE) [dblp]
Julia Buhmann (Universität Zürich, CH) [dblp]
Albert Cardona (University of Cambridge, GB) [dblp]
Forrest Collman (Allen Institute for Brain Science - Seattle, US) [dblp]
Nuno Maçarico da Costa (Allen Institute for Brain Science - Seattle, US) [dblp]
Winfried Denk (MPI für Neurobiologie - Martinsried, DE) [dblp]
Eva Dyer (Georgia Institute of Technology - Atlanta, US) [dblp]
Rainer W. Friedrich (FMI - Basel, CH) [dblp]
Jan Funke (Howard Hughes Medical Institute - Ashburn, US) [dblp]
Christel Genoud (FMI - Basel, CH) [dblp]
Stephan Gerhard (FMI - Basel, CH) [dblp]
William Gray Roncal (Johns Hopkins Univ. - Baltimore, US) [dblp]
Moritz Helmstaedter (MPI for Brain Research - Frankfurt am Main, DE) [dblp]
Michal Januszewski (Google Switzerland - Zürich, CH) [dblp]
Joergen Kornfeld (MPI für Neurobiologie - Martinsried, DE) [dblp]
Anna Kreshuk (EMBL - Heidelberg, DE) [dblp]
Julia Kuhl (MPI für Neurobiologie - Martinsried, DE)
Wei-Chung Allen Lee (Harvard Medical School - Boston, US) [dblp]
Jeff Lichtman (Harvard University - Cambridge, US) [dblp]
Jeremy Maitin-Shepard (Google Research - Mountain View, US) [dblp]
Yaron Meirovitch (Harvard University - Cambridge, US) [dblp]
Josh Morgan (Washington University, US) [dblp]
R. Clay Reid (Allen Institute for Brain Science - Seattle, US) [dblp]
Kerrianne Ryan (Dalhousie University - Halifax, CA)
Stephan Saalfeld (Howard Hughes Medical Institute - Ashburn, US) [dblp]
Aravinthan D.T. Samuel (Harvard University - Cambridge, US)
Louis Scheffer (Howard Hughes Medical Institute - Ashburn, US) [dblp]
Nir Shavit (MIT - Cambridge, US) [dblp]
Jochen Triesch (Goethe-Universität Frankfurt am Main, DE) [dblp]
Xueying Wang (Harvard University - Cambridge, US) [dblp]
Adrian Wanner (Princeton University, US)
Casimir Wierzynski (Intel - Santa Clara, US) [dblp]
Dirk Zeidler (Carl Zeiss - Oberkochen, DE) [dblp]

Classification

data structures / algorithms / complexity

Keywords

Connectomics
Big Data
Parallel Computing
Distributed Computing
Machine Learning

Seminar 18481

Search the Dagstuhl Website

Schloss Dagstuhl Services

Seminars

Within this website:

External resources:

Publishing

Within this website:

External resources:

dblp

Within this website:

External resources:

Dagstuhl Seminar 18481

High Throughput Connectomics

( Nov 25 – Nov 30, 2018 )

Permalink

Organizers

Contact

Publications

Motivation

Summary

Introduction

Relation to previous Dagstuhl seminars

Participants

Classification

Keywords