• #### Putting An End to End-to-End: Gradient-Isolated Learning of Representationsver. 3

We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural networks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al. [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.
Mutual informationBackpropagationArchitectureOptimizationDeep Neural NetworksDeep learningNeural networkTraining setFeature extractionOverfitting...
• #### Quantum Hoare Type Theory

As quantum computers become real, it is high time we come up with effective techniques that help programmers write correct quantum programs. Inspired by Hoare Type Theory in classical computing, we propose Quantum Hoare Type Theory (QHTT) in which precise specifications about the modification to the quantum state can be provided within the type of a computation. These specifications within a Hoare type are given in the form of Hoare-logic style pre- and postconditions following the propositions-as-types principle. The type-checking process verifies that the implementation conforms to the provided specification. QHTT has the potential to be a unified system for programming, specifying, and reasoning about quantum programs.
QubitProgrammingQuantum computationQuantum programmingBell stateProgramming LanguageQuantum teleportationVector spaceSuperpositionInference...
• #### Predicting online user behaviour using deep learning algorithmsver. 3

We propose a robust classifier to predict buying intentions based on user behaviour within a large e-commerce website. In this work we compare traditional machine learning techniques with the most advanced deep learning approaches. We show that both Deep Belief Networks and Stacked Denoising auto-Encoders achieved a substantial improvement by extracting features from high dimensional data during the pre-train phase. They prove also to be more convenient to deal with severe class imbalance.
AutoencoderHidden layerOptimizationDeep learningNeural networkRandom forestMachine learningDeep Neural NetworksLogistic regressionRegularization...
• #### Strategies to Detect Dark-Matter Decays with Line-Intensity Mapping

The nature of dark matter is a longstanding mystery in cosmology, which can be studied with laboratory or collider experiments, as well as astrophysical and cosmological observations. In this work, we propose realistic and efficient strategies to detect radiative products from dark-matter decays with line-intensity mapping (LIM) experiments. This radiation will behave as a line interloper for the atomic and molecular spectral lines targeted by LIM surveys. The most distinctive signatures of the contribution from dark-matter radiative decays are an extra anisotropy on the LIM power spectrum due to projection effects, as well as a narrowing and a shift towards higher intensities of the voxel intensity distribution. We forecast the minimum rate of decays into two photons that LIM surveys will be sensitive to as function of the dark-matter mass in the range $\sim 10^{-6}-10$ eV, and discuss how to reinterpret such results for dark matter that decays into a photon and another particle. We find that both the power spectrum and the voxel intensity distribution are expected to be very sensitive to the dark-matter contribution, with the voxel intensity distribution being more promising for most experiments considered. Interpreting our results in terms of the axion, we show that LIM surveys will be extremely competitive to detect its decay products, improving several orders of magnitudes (depending on the mass) the sensitivity of laboratory and astrophysical searches, especially in the mass range $\sim 1-10$ eV.
Dark matter decayPower spectrumDark matterIntensityBrightness temperatureDark matter particle massSpectral lineLine intensity mappingLuminosityAxion...
• #### YOLOv4: Optimal Speed and Accuracy of Object Detection

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at https://github.com/AlexeyAB/darknet
Object detectionConvolution Neural NetworkCOCO simulationMain sequence starAttentionGenetic algorithmActivation functionRegularizationNeural networkGround truth...
• #### YOLOv3: An Incremental Improvement

We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Ground truthCOCO simulationSmall-scale dynamoSaturnian satellitesClassificationImage ProcessingTwitterGraphFlux power spectrumConvolution Neural Network...
• #### TF-Ranking: Scalable TensorFlow Library for Learning-to-Rankver. 2

Learning-to-Rank deals with maximizing the utility of a list of examples presented to the user, with items of higher relevance being prioritized. It has several practical applications such as large-scale search, recommender systems, document summarization and question answering. While there is widespread support for classification and regression based learning, support for learning-to-rank in deep learning has been limited. We propose TensorFlow Ranking, the first open source library for solving large-scale ranking problems in a deep learning framework. It is highly configurable and provides easy-to-use APIs to support different scoring mechanisms, loss functions and evaluation metrics in the learning-to-rank setting. Our library is developed on top of TensorFlow and can thus fully leverage the advantages of this platform. For example, it is highly scalable, both in training and in inference, and can be used to learn ranking models over massive amounts of user activity data, which can include heterogeneous dense and sparse features. We empirically demonstrate the effectiveness of our library in learning ranking functions for large-scale search and recommendation applications in Gmail and Google Drive. We also show that ranking models built using our model scale well for distributed training, without significant impact on metrics. The proposed library is available to the open source community, with the hope that it facilitates further academic research and industrial applications in the field of learning-to-rank.
RankingLearning to rankNeural networkRankGraphDeep learningApplication programming interfaceGoogle.comRegressionClassification...
• #### Matching Networks for One Shot Learningver. 2

Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
Neural networkClassificationEmbeddingArchitectureDeep learningConvolution Neural NetworkNearest-neighbor siteMachine learningTraining setOverfitting...
• #### YOLO9000: Better, Faster, Stronger

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that don't have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes; it predicts detections for more than 9000 different object categories. And it still runs in real-time.
ClassificationCOCO simulationConvolution Neural NetworkFlux power spectrumSmall-scale dynamoGround truthArchitectureGraphInstabilityNeural network...
• #### You Only Look Once: Unified, Real-Time Object Detectionver. 5

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.
OptimizationNeural networkRegressionArchitectureClassificationConvolution Neural NetworkImage ProcessingSchedulingSaturnian satellitesRobotics...
• #### SSD: Single Shot MultiBox Detectorver. 5

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For $300\times 300$ input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for $500\times 500$ input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .
Small-scale dynamoConvolution Neural NetworkCOCO simulationClassificationArchitectureFlux power spectrumGround truthDeep Neural NetworksSaturnian satellitesInference...

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron
Convolution Neural NetworkCOCO simulationClassificationObject detectionQuantizationArchitectureInferenceAblationSemantic segmentationGround truth...
• #### Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
RegressionOptimizationRankingClassificationAblationFeature extractionBackpropagationTranslational invarianceConvolution Neural NetworkBinary number...
• #### Fast R-CNNver. 2

This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.
ClassificationRegressionScale invarianceCompressibilityBackpropagationPythonHyperparameterSingular valueTraining setFeature vector...
• #### Rich feature hierarchies for accurate object detection and semantic segmentationver. 5

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.
Convolution Neural NetworkSupport vector machineClassificationRegressionArchitectureHyperparameterFeature vectorTraining setNeocognitronOptimization...
• #### U-Net: Convolutional Networks for Biomedical Image Segmentation

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
ElasticityArchitectureMembraneGaussian distributionEntropyInterferenceImage ProcessingSaturnian satellitesTransmission electron microscopyGlass...
• #### MobileNetV2: Inverted Residuals and Linear Bottlenecksver. 4

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters
ArchitectureManifoldObject detectionNeural networkCOCO simulationInferenceSemantic segmentationClassificationGraphImage segmentation...
• #### MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
ArchitectureClassificationConvolution Neural NetworkObject detectionDistillationCOCO simulationDeep Neural NetworksSmall-scale dynamoRegularizationCrossed product...
• #### Densely Connected Convolutional Networksver. 5

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet .
ArchitectureConvolution Neural NetworkGradient flowInformation flowMachine learningNetworksObjectDropletTopology...
• #### Aggregated Residual Transformations for Deep Neural Networksver. 2

We present a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call "cardinality" (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. On the ImageNet-1K dataset, we empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy. Moreover, increasing cardinality is more effective than going deeper or wider when we increase the capacity. Our models, named ResNeXt, are the foundations of our entry to the ILSVRC 2016 classification task in which we secured 2nd place. We further investigate ResNeXt on an ImageNet-5K set and the COCO detection set, also showing better results than its ResNet counterpart. The code and models are publicly available online.
ArchitectureClassificationCOCO simulationConvolution Neural NetworkDeep Neural NetworksEmbeddingNeural networkObject detectionDifferential form of degree threeAblation...
• #### Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learningver. 2

Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge
ArchitectureImage recognitionConjunctionInstabilityObject detectionLarge scale structureDeep Residual NetworksInception ModulesOptimizationImage Processing...
• #### Rethinking the Inception Architecture for Computer Visionver. 3

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.
ArchitectureImage ProcessingClassificationRegularizationOptimizationFilter bankDimension reductionBig dataEntropyUniform distribution...
• #### Identity Mappings in Deep Residual Networksver. 3

Deep residual networks have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which makes training easier and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/resnet-1k-layers
Deep Residual NetworksArchitectureOptimizationAblationRegularizationActivation functionTraining setBackpropagationOverfittingAttention...
• #### Deep Residual Learning for Image Recognition

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
COCO simulationOptimizationClassificationArchitectureRegressionObject detectionImage recognitionTraining setGround truthTraining Image...
• #### Very Deep Convolutional Networks for Large-Scale Image Recognitionver. 6

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
RegularizationTraining setStatisticsArchitectureImage ProcessingSecurityClassificationComplementarityHigh Performance ComputingClassification systems...
• #### Entropy-SGD: Biasing Gradient Descent Into Wide Valleysver. 5

This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
EntropyGeneralization errorLangevin dynamicsOptimizationDeep Neural NetworksEnergyEigenvalueAlgorithmsObjectiveGeometry...
• #### Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Ratesver. 3

In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. A primary insight that allows super-convergence training is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve an optimal regularization balance. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. Experiments demonstrate super-convergence for Cifar-10/100, MNIST and Imagenet datasets, and resnet, wide-resnet, densenet, and inception architectures. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. The architectures and code to replicate the figures in this paper are available at github.com/lnsmith54/super-convergence. See http://www.fast.ai/2018/04/30/dawnbench-fastai/ for an application of super-convergence to win the DAWNBench challenge (see https://dawn.cs.stanford.edu/benchmark/).
ArchitectureRegularizationOptimizationSchedulingNeural networkStochastic gradient descentStatisticsDeep learningOverfittingDeep Neural Networks...
• #### Deeply-Supervised Netsver. 2

Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convolutional neural networks (CNN) style architectures are being looked at: (1) transparency of the intermediate layers to the overall classification; (2) discriminativeness and robustness of learned features, especially in the early layers; (3) effectiveness in training due to the presence of the exploding and vanishing gradients. We introduce "companion objective" to the individual hidden layers, in addition to the overall objective at the output layer (a different strategy to layer-wise pre-training). We extend techniques from stochastic gradient methods to analyze our algorithm. The advantage of our method is evident and our experimental result on benchmark datasets shows significant performance gain over existing methods (e.g. all state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and SVHN).
DAMA/LIBRAClassificationRegularizationDeep learningBackpropagationDeep Neural NetworksHyperparameterArchitectureStochastic gradient descentMachine learning...
• #### Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shiftver. 3

Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
BackpropagationCovarianceStatisticsClassificationTraining setStochastic gradient descentOverfittingDeep Neural NetworksArchitectureSingular value...
• #### Layer Normalization

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.
Recurrent neural networkStatisticsNeural networkDeep Neural NetworksHidden layerConvolution Neural NetworkHidden stateFisher information matrixRiemannian metricEmbedding...
• #### Lower Mass Bounds on FIMPs

Feebly Interacting Massive Particles (FIMPs) are dark matter candidates that never thermalize in the early universe and whose production takes place via decays and/or scatterings of thermal bath particles. If FIMPs interactions with the thermal bath are renormalizable, a scenario known as freeze-in, production is most efficient at temperatures around the mass of the bath particles and insensitive to unknown physics at high temperatures. Working in a model-independent fashion, we consider three different production mechanisms: two-body decays, three-body decays, and binary collisions. We compute the FIMP phase space distribution and matter power spectrum, and we investigate the suppression of cosmological structures at small scales. Our results are lower bounds on the FIMP mass. Finally, we study how to relax these constraints in scenarios where FIMPs provide a sub-dominant dark matter component.
Feebly Interacting Massive ParticleDark matterPhase space densityFinal stateFreeze-inThree-body decaysWarm dark matterDegree of freedomFree streaming of particlesTransfer function...
• #### Astrochemistry associated with planet formation

This paper provides a brief summary and overview of the astrochemistry associated with the formation of stars and planets. It is aimed at new researchers in the field to enable them to obtain a quick overview of the landscape and key literature in this rapidly evolving area. The journey of molecules from clouds to protostellar envelopes, disks and ultimately exoplanet atmospheres is described. The importance of the close relation between the chemistry of gas and ice and the physical structure and evolution of planet-forming disks, including the growth and drift of grains and the locking up of elements at dust traps, is stressed. Using elemental abundance ratios like C/O, C/N, O/H in exoplanetary atmospheres to link them to their formation sites is therefore not straightforward. Interesting clues come from meteorites and comets in our own solar system, as well as from the composition of Earth. A new frontier is the analysis of the kinematics of molecular lines to detect young planets in disks. A number of major questions to be addressed in the coming years are formulated, and challenges and opportunities are highlighted.
PlanetAtacama Large Millimeter ArrayPlanet formationInfrared limitAstrochemistrySolar systemCometVolatilesEarthPlanetesimal...
• #### Homology, lower central series, and hyperplane arrangementsver. 2

We explore finitely generated groups by studying the nilpotent towers and the various Lie algebras attached to such groups. Our main goal is to relate an isomorphism extension problem in the Postnikov tower to the existence of certain commuting diagrams. This recasts a result of G. Rybnikov in a more general framework and leads to an application to hyperplane arrangements, whereby we show that all the nilpotent quotients of a decomposable arrangement group are combinatorially determined.
IsomorphismNilpotentExact sequenceTorsion tensorHolonomyHomomorphismHyperplane arrangementRankCohomologyNilpotent group...
• #### Tensor decomposition for bosonic and fermionic scattering amplitudes

In this paper, we elaborate on a method to decompose multiloop multileg scattering amplitudes into Lorentz-invariant form factors, which exploits the simplifications that arise from considering four-dimensional external states. We propose a simple and general approach that applies to both fermionic and bosonic amplitudes and allows us to identify a minimal number of physically relevant form factors, which can be related one-to-one to the independent helicity amplitudes. We discuss explicitly its applicability to various four- and five-point scattering amplitudes relevant for LHC physics.
HelicityScattering amplitudeForm factorRankDimensional regularizationHelicity statesMassive vector bosonGauge invariancePath integralLorentz invariant...
• #### Real-time gravitational replicas: Formalism and a variational principle

This work is the first step in a two-part investigation of real-time replica wormholes. Here we study the associated real-time gravitational path integral and construct the variational principle that will define its saddle-points. We also describe the general structure of the resulting real-time replica wormhole saddles, setting the stage for construction of explicit examples. These saddles necessarily involve complex metrics, and thus are accessed by deforming the original real contour of integration. However, the construction of these saddles need not rely on analytic continuation, and our formulation can be used even in the presence of non-analytic boundary-sources. Furthermore, at least for replica- and CPT-symmetric saddles we show that the metrics may be taken to be real in regions spacelike separated from a so-called `splitting surface'. This feature is an important hallmark of unitarity in a field theory dual.
Path integralWormholeVariational principleExtrinsic curvatureSaddle pointEinstein-Hilbert actionAnti de Sitter spaceField theoryAnalytic continuationEntropy...
• #### Joint galaxy-galaxy lensing and clustering constraints on galaxy formationver. 2

We compare predictions for galaxy-galaxy lensing profiles and clustering from the Henriques et al. (2015) public version of the Munich semi-analytical model of galaxy formation (SAM) and the IllustrisTNG suite, primarily TNG300, with observations from KiDS+GAMA and SDSS-DR7 using four different selection functions for the lenses (stellar mass, stellar mass and group membership, stellar mass and isolation criteria, stellar mass and colour). We find that this version of the SAM does not agree well with the current data for stellar mass-only lenses with $M_\ast > 10^{11}\,M_\odot$. By decreasing the merger time for satellite galaxies as well as reducing the radio-mode AGN accretion efficiency in the SAM, we obtain better agreement, both for the lensing and the clustering, at the high mass end. We show that the new model is consistent with the signals for central galaxies presented in Velliscig et al. (2017). Turning to the hydrodynamical simulation, TNG300 produces good lensing predictions, both for stellar mass-only ($\chi^2 = 1.81$ compared to $\chi^2 = 7.79$ for the SAM), and locally brightest galaxies samples ($\chi^2 = 3.80$ compared to $\chi^2 = 5.01$). With added dust corrections to the colours it matches the SDSS clustering signal well for red low mass galaxies. We find that both the SAMs and TNG300 predict $\sim 50\,\%$ excessive lensing signals for intermediate mass red galaxies with $10.2 < \log_{10} M_\ast [ M_\odot ] < 11.2$ at $r \approx 0.6\,h^{-1}\,\mathrm{Mpc}$, which require further theoretical development.
Stellar massSemi-analytical model of galaxy formationGalaxyLensing signalStellar mass functionEAGLE simulation projectIllustris simulationSloan Digital Sky SurveyVirial massHigh mass...
• #### The global stability of M33 in MONDver. 2

The dynamical stability of disk galaxies is sensitive to whether their anomalous rotation curves are caused by dark matter halos or Milgromian Dynamics (MOND). We investigate this by setting up a MOND model of M33. We first simulate it in isolation for 6 Gyr, starting from an initial good match to the rotation curve (RC). Too large a bar and bulge form when the gas is too hot, but this is avoided by reducing the gas temperature. A strong bar still forms in 1 Gyr, but rapidly weakens and becomes consistent with the observed weak bar. Previous work showed this to be challenging in Newtonian models with a live dark matter halo, which developed strong bars. The bar pattern speed implies a realistic corotation radius of 3 kpc. However, the RC still rises too steeply, and the central line of sight velocity dispersion (LOSVD) is too high. We then add a constant external acceleration field of $8.4 \times 10^{-12}$ m/s$^2$ at $30^\circ$ to the disk as a first order estimate for the gravity exerted by M31. This suppresses buildup of material at the centre, causing the RC to rise more slowly and reducing the central LOSVD. Overall, this simulation bears good resemblance to several global properties of M33, and highlights the importance of including even a weak external field on the stability and evolution of disk galaxies. Further simulations with a time-varying external field, modeling the full orbit of M33, will be needed to confirm its resemblance to observations.
Triangulum GalaxyRotation CurveModified Newtonian DynamicsGalaxyAndromeda galaxyStarDisk galaxyDark matter halodeep-MOND limitVelocity dispersion...
• #### The mass-size relation of galaxy clusters

The outskirts of accreting dark matter haloes exhibit a sudden drop in density delimiting their multi-stream region. Due to the dynamics of accretion, the location of this physically motivated edge strongly correlates with the halo growth rate. Using hydrodynamical zoom-in simulations of high-mass clusters, we explore this definition in realistic simulations and find an explicit connection between this feature in the dark matter and galaxy profiles. We also show that the depth of the splashback feature correlates well with the direction of filaments and, surprisingly, the orientation of the brightest cluster galaxy. Our findings suggest that galaxy profiles and weak-lensing masses can define an observationally viable mass-size scaling relation for galaxy clusters, which can be used to extract cosmological information.
GalaxyMass accretion rateCluster of galaxiesApocenterDark matter subhaloSplashback radiusAccretionPhase space causticDark matterWeak lensing mass estimate...
• #### Shock and Splash: Gas and Dark Matter Halo Boundaries around LambdaCDM Galaxy Clusters

Recent advances in simulations and observations of galaxy clusters suggest that there exists a physical outer boundary of massive cluster-size dark matter haloes. In this work, we investigate the locations of the outer boundaries of dark matter and gas around cluster-size dark matter haloes, by analyzing a sample of 65 massive dark matter halos extracted from the Omega500 zoom-in hydrodynamical cosmological simulations. We show that the location of accretion shock is offset from that of the dark matter splashback radius, contrary to the prediction of the self-similar models. The accretion shock radius is larger than all definitions of the splashback radius in the literature by 20-100%. The accretion shock radius defined using the steepest drop in the entropy pressure profiles is approximately 2 times larger than the splashback radius defined by the steepest slope in the dark matter density profile, and it is ~1.2 times larger than the edge of the dark matter phase-space structure. We discuss implications of our results for multi-wavelength studies of galaxy clusters.
AccretionDark matterSplashback radiusDark matter haloEntropyCluster of galaxiesPhase spaceLine of sightDark matter particleSimulations of structure formation...
• #### A radio ridge connecting two galaxy clusters in a filament of the cosmic web

Galaxy clusters are the most massive gravitationally bound structures in the Universe. They grow by accreting smaller structures in a merging process that produces shocks and turbulence in the intra-cluster gas. We observed a ridge of radio emission connecting the merging galaxy clusters Abell 0399 and Abell 0401 with the Low Frequency Array (LOFAR) at 140 MHz. This emission requires a population of relativistic electrons and a magnetic field located in a filament between the two galaxy clusters. We performed simulations to show that a volume-filling distribution of weak shocks may re-accelerate a pre-existing population of relativistic particles, producing emission at radio wavelengths that illuminates the magnetic ridge.
Cluster of galaxiesCosmic webLow Frequency ArrayRelativistic electronMerging galaxy clusterTurbulenceSimulationsUniverseParticlesWavelength...
• #### Towards constraining warm dark matter with stellar streams through neural simulation-based inference

A statistical analysis of the observed perturbations in the density of stellar streams can in principle set stringent contraints on the mass function of dark matter subhaloes, which in turn can be used to constrain the mass of the dark matter particle. However, the likelihood of a stellar density with respect to the stream and subhaloes parameters involves solving an intractable inverse problem which rests on the integration of all possible forward realisations implicitly defined by the simulation model. In order to infer the subhalo abundance, previous analyses have relied on Approximate Bayesian Computation (ABC) together with domain-motivated but handcrafted summary statistics. Here, we introduce a likelihood-free Bayesian inference pipeline based on Amortised Approximate Likelihood Ratios (AALR), which automatically learns a mapping between the data and the simulator parameters and obviates the need to handcraft a possibly insufficient summary statistic. We apply the method to the simplified case where stellar streams are only perturbed by dark matter subhaloes, thus neglecting baryonic substructures, and describe several diagnostics that demonstrate the effectiveness of the new method and the statistical quality of the learned estimator.
Dark matter subhaloStatistical estimatorApproximate Bayesian computationStatisticsStellar streamArchitectureGD-1 stellar streamCoverage probabilityWarm dark matterInference...
• #### Fast and Accurate Non-Linear Predictions of Universes with Deep Learning

Cosmologists aim to model the evolution of initially low amplitude Gaussian density fluctuations into the highly non-linear "cosmic web" of galaxies and clusters. They aim to compare simulations of this structure formation process with observations of large-scale structure traced by galaxies and infer the properties of the dark energy and dark matter that make up 95% of the universe. These ensembles of simulations of billions of galaxies are computationally demanding, so that more efficient approaches to tracing the non-linear growth of structure are needed. We build a V-Net based model that transforms fast linear predictions into fully nonlinear predictions from numerical simulations. Our NN model learns to emulate the simulations down to small scales and is both faster and more accurate than the current state-of-the-art approximate methods. It also achieves comparable accuracy when tested on universes of significantly different cosmological parameters from the one used in training. This suggests that our model generalizes well beyond our training set.
GalaxyCosmological parametersLinear predictionDeep learningNumerical simulationDark matterDark energyStructure formationCosmologyTraining set...
• #### dm2gal: Mapping Dark Matter to Galaxies with Neural Networks

Maps of cosmic structure produced by galaxy surveys are one of the key tools for answering fundamental questions about the Universe. Accurate theoretical predictions for these quantities are needed to maximize the scientific return of these programs. Simulating the Universe by including gravity and hydrodynamics is one of the most powerful techniques to accomplish this; unfortunately, these simulations are very expensive computationally. Alternatively, gravity-only simulations are cheaper, but do not predict the locations and properties of galaxies in the cosmic web. In this work, we use convolutional neural networks to paint galaxy stellar masses on top of the dark matter field generated by gravity-only simulations. Stellar mass of galaxies are important for galaxy selection in surveys and thus an important quantity that needs to be predicted. Our model outperforms the state-of-the-art benchmark model and allows the generation of fast and accurate models of the observed galaxy distribution.
Stellar massGalaxyHalo Occupation DistributionDark matterDark matter subhaloNeural networkConvolution Neural NetworkCosmic webVirial massGalaxy mass...
• #### Clustering of CODEX clusters

Aims. We analyze the autocorrelation function of a large contiguous sample of galaxy clusters, the Constrain Dark Energy with X-ray (CODEX) sample, in which we take particular care of cluster definition. These clusters were X-ray selected using the RASS survey and then identified as galaxy clusters using the code redMaPPer run on the photometry of the SDSS. We develop methods for precisely accounting for the sample selection effects on the clustering and demonstrate their robustness using numerical simulations. Methods. Using the clean CODEX sample, which was obtained by applying a redshift-dependent richness selection, we computed the two-point autocorrelation function of galaxy clusters in the $0.1<z<0.3$ and $0.3<z<0.5$ redshift bins. We compared the bias in the measured correlation function with values obtained in numerical simulations using a similar cluster mass range. Results. By fitting a power law, we measured a correlation length $r_0=18.7 \pm 1.1$ and slope $\gamma=1.98 \pm 0.14$ for the correlation function in the full redshift range. By fixing the other cosmological parameters to their WMAP9 values, we reproduced the observed shape of the correlation function under the following cosmological conditions: $\Omega_{m_0}=0.22^{+0.04}_{-0.03}$ and $S_8=\sigma_8 (\Omega_{m_0} /0.3)^{0.5}=0.85^{+0.10}_{-0.08}$ with estimated additional systematic errors of $\sigma_{\Omega_{m_0}} = 0.02$ and $\sigma_{S_8} = 0.20$. We illustrate the complementarity of clustering constraints by combining them with CODEX cosmological constraints based on the X-ray luminosity function, deriving $\Omega_{m_0} = 0.25 \pm 0.01$ and $\sigma_8 = 0.81^{+0.01}_{-0.02}$ with an estimated additional systematic error of $\sigma_{\Omega_{m_0}} = 0.07$ and $\sigma_{\sigma_8} = 0.04$. The mass calibration and statistical quality of the mass tracers are the dominant source of uncertainty.
Two-point correlation functionCosmologyCluster of galaxiesRedshift binsSystematic errorVirial cluster massDark matterSloan Digital Sky SurveyMonte Carlo Markov chainCosmological parameters...
• #### HistFitter software framework for statistical data analysis

We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fitted to data and interpreted with statistical tests. A key innovation of HistFitter is its design, which is rooted in core analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its very fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple data models at once, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication-quality style through a simple command-line interface.
• #### The splashback boundary of haloes in hydrodynamic simulations

The splashback radius, $R_{\rm sp}$, is a physically motivated halo boundary that separates infalling and collapsed matter of haloes. We study $R_{\rm sp}$ in the hydrodynamic and dark matter only IllustrisTNG simulations. The most commonly adopted signature of $R_{\rm sp}$ is the radius at which the radial density profiles are steepest. Therefore, we explicitly optimise our density profile fit to the profile slope and find that this leads to a $\sim5\%$ larger radius compared to other optimisations. We calculate $R_{\rm sp}$ for haloes with masses between $10^{13-15}{\rm M}_{\odot}$ as a function of halo mass, accretion rate and redshift. $R_{\rm sp}$ decreases with mass and with redshift for haloes of similar $M_{\rm200m}$ in agreement with previous work. We also find that $R_{\rm sp}/R_{\rm200m}$ decreases with halo accretion rate. We apply our analysis to dark matter, gas and satellite galaxies associated with haloes to investigate the observational potential of $R_{\rm sp}$. The radius of steepest slope in gas profiles is consistently smaller than the value calculated from dark matter profiles. The steepest slope in galaxy profiles, which are often used in observations, tends to agree with dark matter profiles but is lower for less massive haloes. We compare $R_{\rm sp}$ in hydrodynamic and N-body dark matter only simulations and do not find a significant difference caused by the addition of baryonic physics. Thus, results from dark matter only simulations should be applicable to realistic haloes.
Mass accretion rateDark matterGalaxySplashback radiusDark matter subhaloDark Matter Density ProfileVirial massN-body simulationApocenterPhase space caustic...
• #### An Iterative Reconstruction Algorithm for Faraday Tomography

Faraday tomography offers crucial information on the magnetized astronomical objects, such as quasars, galaxies, or galaxy clusters, by observing its magnetoionic media. The observed linear polarization spectrum is inverse Fourier transformed to obtain the Faraday dispersion function (FDF), providing us a tomographic distribution of the magnetoionic media along the line of sight. However, this transform gives a poor reconstruction of the FDF because of the instrument's limited wavelength coverage. The current Faraday tomography techniques' inability to reliably solve the above inverse problem has noticeably plagued cosmic magnetism studies. We propose a new algorithm inspired by the well-studied area of signal restoration, called the Constraining and Restoring iterative Algorithm for Faraday Tomography (CRAFT). This iterative model-independent algorithm is computationally inexpensive and only requires weak physically-motivated assumptions to produce high fidelity FDF reconstructions. We demonstrate an application for a realistic synthetic model FDF of the Milky Way, where CRAFT shows greater potential over other popular model-independent techniques. The dependence of observational frequency coverage on the various techniques' reconstruction performance is also demonstrated for a simpler FDF. CRAFT exhibits improvements even over model-dependent techniques (i.e., QU-fitting) by capturing complex multi-scale features of the FDF amplitude and polarization angle variations within a source. The proposed approach will be of utmost importance for future cosmic magnetism studies, especially with broadband polarization data from the Square Kilometre Array and its precursors. We make the CRAFT code publicly available.
SparsityMagnetismAustralian SKA PathfinderFull width at half maximumSquare Kilometre ArrayInverse problemsRegularizationGalaxyMilky WayLine of sight...
• #### Unraveling the origin of magnetic fields in galaxies

Despite their ubiquity, there are many open questions regarding galactic and cosmic magnetic fields. Specifically, current observational constraints cannot rule out if magnetic fields observed in galaxies were generated in the Early Universe or are of astrophysical nature. Motivated by this we use our magnetic tracers algorithm to investigate whether the signatures of primordial magnetic fields persist in galaxies throughout cosmic time. We simulate a Milky Way-like galaxy in four scenarios: magnetised solely by primordial magnetic fields, magnetised exclusively by SN-injected magnetic fields, and two combined primordial + SN magnetisation cases. We find that once primordial magnetic fields with a comoving strength $B_0 >10^{-12}$ G are considered, they remain the primary source of galaxy magnetisation. Our magnetic tracers show that, even combined with galactic sources of magnetisation, when primordial magnetic fields are strong, they source the large-scale fields in the warm metal-poor phase of the simulated galaxy. In this case, the circumgalactic and intergalactic medium can be used to probe $B_0$ without risk of pollution by magnetic fields originated in the galaxy. Furthermore, whether magnetic fields are primordial or astrophysically-sourced can be inferred by studying local gas metallicity. As a result, we predict that future state-of-the-art observational facilities of magnetic fields in galaxies will have the potential to unravel astrophysical and primordial magnetic components of our Universe.
GalaxySupernovaCosmological magnetic fieldMilky WayMagnetic energyMetallicityActive Galactic NucleiStar formationTurbulent dynamoInterstellar medium...
• #### Chern-Weil Global Symmetries and How Quantum Gravity Avoids Them

We draw attention to a class of generalized global symmetries, which we call "Chern-Weil global symmetries," that arise ubiquitously in gauge theories. The Noether currents of these Chern-Weil global symmetries are given by wedge products of gauge field strengths, such as $F_2 \wedge H_3$ and $\text{tr}(F_2^2)$, and their conservation follows from Bianchi identities. As a result, they are not easy to break. However, it is widely believed that exact global symmetries are not allowed in a consistent theory of quantum gravity. As a result, any Chern-Weil global symmetry in a low-energy effective field theory must be either broken or gauged when the theory is coupled to gravity. In this paper, we explore the processes by which Chern-Weil symmetries may be broken or gauged in effective field theory and string theory. We will see that many familiar phenomena in string theory, such as axions, Chern-Simons terms, worldvolume degrees of freedom, and branes ending on or dissolving in other branes, can be interpreted as consequences of the absence of Chern-Weil symmetries in quantum gravity, suggesting that they might be general features of quantum gravity. We further discuss implications of breaking and gauging Chern-Weil symmetries for particle phenomenology and for boundary CFTs of AdS bulk theories. Chern-Weil global symmetries thus offer a unified framework for understanding many familiar aspects of quantum field theory and quantum gravity.
Global symmetryGauge fieldAxionMagnetic monopoleQuantum gravityInstantonChern-Simons termString theoryGauge theoryManifold...