Recently bookmarked papers

with concepts:
  • Upcoming galaxy surveys such as LSST and Euclid are expected to significantly improve the power of weak lensing as a cosmological probe. In order to maximise the information that can be extracted from these surveys, it is important to explore novel statistics that complement standard weak lensing statistics such as the shear-shear correlation function and peak counts. In this work, we use a recently proposed weak lensing observable -- weak lensing voids -- to make parameter constraint forecasts for an LSST-like survey. We make use of the cosmo-SLICS suite of $w$CDM simulations to measure void statistics (abundance and tangential shear) as a function of cosmological parameters. The simulation data is used to train a Gaussian process regression emulator that we use to generate likelihood contours and provide parameter constraints from mock observations. We find that the void abundance is more constraining than the tangential shear profiles, though the combination of the two gives additional constraining power. We forecast that without tomographic decomposition, these void statistics can constrain the matter fluctuation amplitude, $S_8$ within 0.7\% (68\% confidence interval), while offering 4.3, 4.7 and 6.9\% precision on the matter density parameter, $\Omega_{\rm m}$, the reduced Hubble constant, $h$, and the dark energy equation of state parameter, $w_0$, respectively. We find that these results are tighter than the constraints given by the shear-shear correlation function with the same observational specifications, indicating that weak lensing void statistics can be a promising cosmological probe potentially complementary with other lensing tests.
    Cosmic voidCosmologyTangential shearShearedWeak lensingVoid StatisticsTwo-point correlation functionCosmological parametersStatisticsGalaxy...
  • We present the ROGER (Reconstructing Orbits of Galaxies in Extreme Regions) code, which uses three different machine learning techniques to classify galaxies in, and around, clusters, according to their projected phase-space position. We use a sample of 34 massive, $M_{200}>10^{15} h^{-1} M_{\odot}$, galaxy clusters in the MultiDark Planck 2 (MDLP2) simulation at redshift zero. We select all galaxies with stellar mass $M_{\star} \ge 10^{8.5} h^{-1}M_{\odot}$, as computed by the semi-analytic model of galaxy formation SAG, that are located in, and in the vicinity of, the clusters and classify them according to their orbits. We train ROGER to retrieve the original classification of the galaxies out of their projected phase-space positions. For each galaxy, ROGER gives as output the probability of being a cluster galaxy, a galaxy that has recently fallen into a cluster, a backsplash galaxy, an infalling galaxy, or an interloper. We discuss the performance of the machine learning methods and potential uses of our code. Among the different methods explored, we find the K-Nearest Neighbours algorithm achieves the best performance.
    GalaxyPseudo phase-space densityMachine learningMilky WayPhase spaceSupport vector machineStar formationCluster of galaxiesGalaxy FormationLine of sight velocity...
  • The NANOGrav Collaboration found strong Bayesian evidence for a common-spectrum stochastic process in its 12.5-yr pulsar timing array (PTA) dataset, with median characteristic strain amplitude at periods of a year of $A_{\rm yr} = 1.92^{+0.75}_{-0.55} \times 10^{-15}$. However, evidence for the quadrupolar Hellings & Downs interpulsar correlations, which are characteristic of gravitational wave (GW) signals, was not yet significant. We emulate and extend the NANOGrav dataset, injecting a wide range of stochastic gravitational wave background (GWB) signals that encompass a variety of amplitudes and spectral shapes. We then apply our standard detection pipeline and explore three key astrophysical milestones: (I) robust detection of the GWB; (II) determination of the source of the GWB; and (III) measurement of the properties of the GWB spectrum. Given the amplitude measured in the 12.5 yr analysis and assuming this signal is a GWB, we expect to accumulate robust evidence of an interpulsar-correlated GWB signal with 15--17 yrs of data. At the initial detection, we expect a fractional uncertainty of 40% on the power-law strain spectrum slope, which is sufficient to distinguish a GWB of supermassive black-hole binary origin from some models predicting primordial or cosmic-string origins. Similarly, the measured GWB amplitude will have an uncertainty of 44% upon initial detection, allowing us to arbitrate between some population models of supermassive black-hole binaries. In general, however, power-law models are distinguishable from those having low-frequency spectral turnovers once 20yrs of data are reached. Even though our study is based on the NANOGrav data, we also derive relations that allow for a generalization to other PTA datasets. Most notably, by combining individual PTA's data into the International Pulsar Timing Array, all of these milestones can be reached significantly earlier.
    Gravitational wave backgroundPulsar timing arrayPulsarSignal to noise ratioGravitational waveStatisticsSupermassive black holeTiming of arrivalCross-correlationBayesian...
  • The Cluster HEritage project with XMM-Newton - Mass Assembly and Thermodynamics at the Endpoint of structure formation (CHEX-MATE) is a three mega-second Multi-Year Heritage Programme to obtain X-ray observations of a minimally-biased, signal-to-noise limited sample of 118 galaxy clusters detected by Planck through the Sunyaev-Zeldovich effect. The programme, described in detail in this paper, aims to study the ultimate products of structure formation in time and mass. It is composed of a census of the most recent objects to have formed (Tier-1: 0.05 < z < 0.2; 2 x 10e14 M_sun < M_500 < 9 x 10e14 M_sun), together with a sample of the highest-mass objects in the Universe (Tier-2: z < 0.6; M_500 > 7.25 x 10e14 M_sun). The programme will yield an accurate vision of the statistical properties of the underlying population, measure how the gas properties are shaped by collapse into the dark matter halo, uncover the provenance of non-gravitational heating, and resolve the major uncertainties in mass determination that limit the use of clusters for cosmological parameter estimation. We will acquire X-ray exposures of uniform depth, designed to obtain individual mass measurements accurate to 15-20% under the hydrostatic assumption. We present the project motivations, describe the programme definition, and detail the ongoing multi-wavelength observational (lensing, SZ, radio) and theoretical effort that is being deployed in support of the project.
    Sunyaev-Zel'dovich effectWeak lensingIntra-cluster mediumSouth Pole TelescopeCosmic microwave backgroundStructure formationField of viewXMM-NewtonNumerical simulationHydrostatic equilibrium...
  • Future precision measurements of CMB polarizations can shed new light on the problem so called Hubble tension. The Hubble tension comes from the difference of the evolutions of the Hubble parameter which are determined with two different distance ladders. The standard distance ladder with the observation of Cepheid variables and type Ia supernovae gives larger values of the Hubble constant, and the inverse distance ladder with the observation of the baryon acoustic oscillations both in the CMB and in the clustering of galaxies gives smaller values of the Hubble constant. These different evolutions of the Hubble parameter indicate different evolutions of the free electron density in the process of the reionization of the universe and different magnitudes of low-l polarizations of the CMB, since these polarizations are mainly produced through the Thomson scattering of CMB photons off these free electrons. We investigate the effect on CMB E-mode and B-mode polarizations of l < 12 assuming non-trivially time-dependent equation of state of dark energy. We find that the case of the standard distance ladder gives higher power of polarizations than the prediction in the LambdaCDM model.
    Cosmic distance ladderReionizationLambda-CDM modelE-modesHubble constant tensionHubble parameterB-modesTensor mode fluctuationsThomson scatteringHubble constant...
  • Upcoming surveys will map the growth of large-scale structure with unprecented precision, improving our understanding of the dark sector of the Universe. Unfortunately, much of the cosmological information is encoded by the small scales, where the clustering of dark matter and the effects of astrophysical feedback processes are not fully understood. This can bias the estimates of cosmological parameters, which we study here for a joint analysis of mock Euclid cosmic shear and Planck cosmic microwave background data. We use different implementations for the modelling of the signal on small scales and find that they result in significantly different predictions. Moreover, the different nonlinear corrections lead to biased parameter estimates, especially when the analysis is extended into the highly nonlinear regime, with both the Hubble constant, $H_0$, and the clustering amplitude, $\sigma_8$, affected the most. Improvements in the modelling of nonlinear scales will therefore be needed if we are to resolve the current tension with more and better data. For a given prescription for the nonlinear power spectrum, using different corrections for baryon physics does not significantly impact the precision of Euclid, but neglecting these correction does lead to large biases in the cosmological parameters. In order to extract precise and unbiased constraints on cosmological parameters from Euclid cosmic shear data, it is therefore essential to improve the accuracy of the recipes that account for nonlinear structure formation, as well as the modelling of the impact of astrophysical processes that redistribute the baryons.
    Cosmological parametersWeak lensingCosmic shearMonte Carlo Markov chainDark energyMatter power spectrumFisher information matrixEuclid missionCosmologyGalaxy...
  • We constrain deviations from general relativity (GR) including both redshift and scale dependencies in the modified gravity (MG) parameters. In particular, we employ the under-used binning approach and compare the results to functional forms. We use available datasets such as Cosmic Microwave Background (CMB) from Planck 2018, Baryonic Acoustic Oscillations (BAO) and Redshift Space Distortions (BAO/RSD) from the BOSS DR12, the 6DF Galaxy Survey, the SDSS DR7 Main Galaxy Sample, the correlation of Lyman-$\alpha$ forest absorption and quasars from SDSS-DR14, Supernova Type Ia (SNe) from the Pantheon compilation, and DES Y1 data. Moreover, in order to maximize the constraining power from available datasets, we analyze MG models where we alternatively set some of the MG parameters to their GR values and vary the others. Using functional forms, we find an up to 3.5-$\sigma$ tension with GR in $\Sigma$ (while $\mu$ is fixed) when using Planck+SNe+BAO+RSD; this goes away when lensing data is included, i.e. CMB lensing and DES (CMBL+DES). Using different binning methods, we find that a tension with GR above 2-$\sigma$ in the (high-z, high-k) bin is persistent even when including CMBL+DES to Planck+SNe+BAO+RSD. Also, we find another tension above 2-$\sigma$ in the (low-z, high-k) bin, but that can be reduced with the addition of lensing data. Furthermore, we perform a model comparison using the Deviance Information Criterion statistical tool and find that the MG model ($\mu=1$, $\Sigma$) is weakly favored by the data compared to $\Lambda$CDM, except when DES data is included. Another noteworthy result is that we find that the binning methods do not agree with the widely-used functional parameterization where the MG parameters are proportional to $\Omega_{\text{DE}}(a)$, and this is clearly apparent in the high-z and high-k regime where this parameterization underestimates the deviations from GR.
    General relativityPlanck missionBaryon acoustic oscillationsRedshift-space distortionRedshift binsModified gravitySloan Digital Sky SurveyCosmic microwave backgroundMonte Carlo Markov chainDegree of freedom...
  • The propagation path of gravitational waves is expected to be bent near massive astrophysical objects. The massive object acts as a lens. Similarly to the lensing of electromagnetic waves, the lens amplifies gravitational waves' amplitude and can produce multiple gravitational-wave images. If we suppose the positions of lens and source of a gravitational wave deviate from the line of sight, the gravitational-wave images arrive at different times because they have traveled different trajectories around the lens at the same speed. Depending on the difference in their arrival times, multiple gravitational waves can be detected as repeated, near-identical events, or superposed gravitational waves with characteristic "beating patterns". In particular, when the lens is small, $\lesssim 10^5 M_\odot$, the lens produces images with short time delays that result in the beating patterns. We utilize deep learning to study the lensing signature. It is known that many state-of-the-art deep learning models are excellent at recognizing foreground images, similar to spectrograms, from background noises. In this work, we study the feasibility of applying deep learning to identify lensing signatures from the spectrogram of gravitational-wave signals detectable by the Advanced LIGO and Virgo detectors. We assume the lens mass is around $10^3 M_\odot$ -- $10^5 M_\odot$ which can produce the order of millisecond time delays between two images of lensed gravitational waves. We discuss the feasibility of two aspects: distinguishing lensed gravitational waves from unlensed ones and estimating the parameters related to not only the lensing factor but also the source binary system and lens. We suggest that the approach of this work would be of particular interest for more complicated lensings for which we do not have accurate waveform templates.
    Gravitational waveSingular isothermal sphere profileRegressionDeep learningSignal to noise ratioLaser Interferometer Gravitational-Wave ObservatoryTime delayBinary black hole systemGalaxyBinary star...
  • Star formation in the universe's most massive galaxies proceeds furiously early in time but then nearly ceases. Plenty of hot gas remains available but does not cool and condense into star-forming clouds. Active galactic nuclei (AGN) release enough energy to inhibit cooling of the hot gas, but energetic arguments alone do not explain why quenching of star formation is most effective in high-mass galaxies. In fact, optical observations show that quenching is more closely related to a galaxy's central stellar velocity dispersion ($\sigma_v$) than to any other characteristic. Here, we show that high $\sigma_v$ is critical to quenching because a deep central potential well maximizes the efficacy of AGN feedback. In order to remain quenched, a galaxy must continually sweep out the gas ejected from its aging stars. Supernova heating can accomplish this task as long as the AGN sufficiently reduces the gas pressure of the surrounding circumgalactic medium (CGM). We find that CGM pressure acts as the control knob on a valve that regulates AGN feedback and suggest that feedback power self-adjusts so that it suffices to lift the CGM out of the galaxy's potential well. Supernova heating then drives a galactic outflow that remains homogeneous if $\sigma_v \gtrsim 240 \, {\rm km \, s^{-1}}$. AGN feedback can effectively quench galaxies with a comparable velocity dispersion, but feedback in galaxies with a much lower velocity dispersion tends to result in convective circulation and accumulation of multiphase gas within the galaxy.
    GalaxySupernovaCircumgalactic mediumQuenchingStar formationCoolingAGN feedbackMilky WayActive Galactic NucleiEntropy...
  • In this note we study soliton, breather and shockwave solutions in certain two dimensional field theories. These include: (i) Heisenberg's model suggested originally to describe the scattering of high energy nucleons (ii) $T\bar T$ deformations of certain canonical scalar field theories with a potential. We find explicit soliton solutions of these models with sine-Gordon and Higgs-type potentials. We prove that the $T\bar T$ deformation of a theory of a given potential does not correct the mass of the soliton of the undeformed one. We further conjecture the form of breather solutions of these models. We show that certain $T\bar T$ deformed actions admit shockwave solutions that generalize those of Heisenberg's Lagrangian.
    SolitonBreatherHeisenberg modelScalar field theoryBorn-Infeld actionField theoryHamiltonianSine-Gordon modelVirial theoremBound state...
  • The central question in this article is how information does leak out from black holes. Relying on algebraic arguments and the concept of superselection sectors, we propose the existence of certain operators whose correlations extend across the black hole atmosphere and range into the interior. Contained in the full algebra, these black hole intertwiners will not belong to the subalgebra describing semiclassical bulk physics. We study this proposal in the context of operator reconstructions for code spaces containing a large number of microstates. As long as the atmosphere is excluded from a particular subsystem, the global state seen under the action of the associated algebra is maximally mixed and therefore described by a single classical background. Once the relevant correlations are encoded, i.e. if the algebra is sufficiently enlarged, perfect state distinguishability becomes possible. We arrive at this by computing the von Neumann entropy which may explain the result obtained by applying the quantum extremal surface prescription to the mixed state. We then examine these insights in the context of black hole evaporation and argue that information is transferred to the radiation via black hole intertwiners. We derive the Page curve. The mechanism above suggests that black hole information is topologically protected. An infalling observer would experience no drama. This may resolve the unitarity problem without running into any firewall or state puzzle, the latter being evident in generalized entropy computations. We also examine the question of how certain wormhole topologies may be understood given these findings. We argue that their occurrence in gravity replica computations may be related to the maximal correlation between radiation and atmosphere surrounding the old black hole. This may suggest a connection between topology change and near horizon quantum gravitational effects.
    Black holeEntropyHorizonSuperselectionEntanglementRyu-TakayanagiWormholeConformal field theoryDegree of freedomMixed states...
  • We computed a set of structures which appear in the four-point function of protected operators of dimension two in $\mathcal{N}=4$ Super Yang Mills with $SU(N)$ gauge group, at any order in a large $N$ expansion. They are determined only by leading order CFT data. By focusing on a specific limit, we made connection with the dual supergravity amplitude in flat space, where such structures correspond to iterated $s$-cuts. We made several checks and we conjecture that the same interpretation holds for supergravity amplitudes on $AdS_5 \times S^5$.
    Conformal field theorySupergravityOperator product expansionTwo-point correlation functionGravitonSuper Yang-Mills theoryPropagatorScattering amplitudeUnitarityPolylogarithm...
  • We present a technique for translating a black-box machine-learned classifier operating on a high-dimensional input space into a small set of human-interpretable observables that can be combined to make the same classification decisions. We iteratively select these observables from a large space of high-level discriminants by finding those with the highest decision similarity relative to the black box, quantified via a metric we introduce that evaluates the relative ordering of pairs of inputs. Successive iterations focus only on the subset of input pairs that are misordered by the current set of observables. This method enables simplification of the machine-learning strategy, interpretation of the results in terms of well-understood physical concepts, validation of the physical model, and the potential for new insights into the nature of the problem itself. As a demonstration, we apply our approach to the benchmark task of jet classification in collider physics, where a convolutional neural network acting on calorimeter jet images outperforms a set of six well-known jet substructure observables. Our method maps the convolutional neural network into a set of observables called energy flow polynomials, and it closes the performance gap by identifying a class of observables with an interesting physical interpretation that has been previously overlooked in the jet substructure literature.
    Machine learningConvolution Neural NetworkGraphEngineeringChromatic numberGround truthColliderAttentionTransverse momentumBoosted decision trees...
  • We develop an optimization approach to model the magnetic field configuration of magnetic clouds, based on a linear-force free formulation in three dimensions. Such a solution, dubbed the Freidberg solution, is kin to the axi-symmetric Lundquist solution, but with more general "helical symmetry". The merit of our approach is demonstrated via its application to two case studies of in-situ measured magnetic clouds. Both yield results of reduced $\chi^2\approx 1$. Case 1 shows a winding flux rope configuration with one major polarity. Case 2 exhibits a double-helix configuration with two flux bundles winding around each other and rooted on regions of mixed polarities. This study demonstrates the three-dimensional complexity of the magnetic cloud structures.
    Magnetic cloudBundleCoronal mass ejectionSunAdvanced Composition ExplorerOptimizationChiralityEarthOrientationAlternating Gradient Synchrotron...
  • The boson and fermion particle masses are calculated in a finite quantum field theory. The field theory satisfies Poincar\'e invariance, unitarity and microscopic causality, and all loop graphs are finite to all orders of perturbation theory. The infinite derivative nonlocal field interactions are regularized with a mass (length) scale parameter $\Lambda_i$. The $W$, $Z$ and Higgs boson masses are calculated from finite one-loop self-energy graphs. The $W^{\pm}$ mass is predicted to be $M_W=80.05$ GeV, and the higher order radiative corrections to the Higgs boson mass $m_H=125$ GeV are damped out above the regulating mass scale parameter $\Lambda_H=1.57$ TeV. The three generations of quark and lepton masses are calculated from finite one-loop self-interactions, and there is an exponential spacing in mass between the quarks and leptons.
    Quantum field theoryStandard ModelGraphPropagatorFermion massSelf-energyHiggs boson massHiggs fieldElectroweakParticle mass...
  • Through a series of simulated observations, we investigate the capability of the instruments aboard the forthcoming THESEUS mission for the detection of a characteristic signal from decaying dark matter (DM) in the keV-MeV energy range. We focus our studies on three well studied Standard Model extensions hosting axion-like particles, dark photon, and sterile neutrino DM candidates. We show that, due to the sensitivity of THESEUS' X and Gamma Imaging Spectrometer (XGIS) instrument, existing constraints on dark matter parameters can be improved by a factor of up to around 300, depending on the considered DM model and assuming a zero level of systematic uncertainty. We also show that even a minimal level of systematic uncertainty of 1% can impair potential constraints by one to two orders of magnitude. We argue that nonetheless, the constraints imposed by THESEUS will be substantially better than existing ones and will well complement the constraints of upcoming missions such as eXTP and Athena. Ultimately, the limits imposed by THESEUS and future missions will ensure a robust and thorough coverage of the parameter space for decaying DM models, enabling either a detection of dark matter or a significant improvement of relevant limits.
    Dark matterX and Gamma Imaging SpectrometerAxion-like particleHidden photonSystematic errorDecaying dark matterTHESEUS missionSoft X-Ray ImagerSterile neutrinoDark matter model...
  • We analyze the problem of neutrino oscillations via a fermionic particle detector model inspired by the physics of the Fermi theory of weak interactions. The model naturally leads to a description of emission and absorption of neutrinos in terms of localized two-level systems. By explicitly including source and detector as part of the dynamics, the formalism is shown to recover the standard results for neutrino oscillations without mention to "flavor states", which are ill-defined in Quantum Field Theory (QFT). This illustrates how particle detector models provide a powerful theoretical tool to approach the measurement issue in QFT and emphasizes that the notion of flavor states, although sometimes useful, must not play any crucial role in neutrino phenomenology.
    NeutrinoNeutrino oscillationsParticle detectorQuantum field theoryTwo-level systemScalar fieldFermionic fieldFermi theory of weak interactionsHamiltonianDegree of freedom...
  • Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure. In this paper, we first provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.
    ProteinMachine learningDeep learningAmino-acidPythonTraining setSoftwareOrientationNumPyProgramming...
  • With the increasing amounts of high-dimensional heterogeneous data to be processed, multi-modality feature selection has become an important research direction in medical image analysis. Traditional methods usually depict the data structure using fixed and predefined similarity matrix for each modality separately, without considering the potential relationship structure across different modalities. In this paper, we propose a novel multi-modality feature selection method, which performs feature selection and local similarity learning simultaniously. Specially, a similarity matrix is learned by jointly considering different imaging modalities. And at the same time, feature selection is conducted by imposing sparse l_{2, 1} norm constraint. The effectiveness of our proposed joint learning method can be well demonstrated by the experimental results on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which outperforms existing the state-of-the-art multi-modality approaches.
    Feature selectionAlzheimer's diseaseSparsityData structuresPotential...
  • Higgs boson is a fundamental particle, and the classification of Higgs signals is a well-known problem in high energy physics. The identification of the Higgs signal is a challenging task because its signal has a resemblance to the background signals. This study proposes a Higgs signal classification using a novel combination of random forest, auto encoder and deep auto encoder to build a robust and generalized Higgs boson prediction system to discriminate the Higgs signal from the background noise. The proposed ensemble technique is based on achieving diversity in the decision space, and the results show good discrimination power on the private leaderboard; achieving an area under the Receiver Operating Characteristic curve of 0.9 and an Approximate Median Significance score of 3.429.
    Higgs bosonDeep Neural NetworksAutoencoderRandom forestReceiver operating characteristicFundamental particleEnergy...
  • Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly structured, i. e. follows the syntax of the programming language. Several recent works develop Transformer modifications for capturing syntactic information in source code. The drawback of these works is that they do not compare to each other and all consider different tasks. In this work, we conduct a thorough empirical study of the capabilities of Transformers to utilize syntactic information in different tasks. We consider three tasks (code completion, function naming and bug fixing) and re-implement different syntax-capturing modifications in a unified framework. We show that Transformers are able to make meaningful predictions based purely on syntactic information and underline the best practices of taking the syntactic information into account for improving the performance of the model.
    AttentionArchitectureComputational linguisticsProgrammingNatural languageFully connected layerProgramming LanguageTraining setEntropyGraph...
  • We present a greedy algorithm for solving binary classification problems in situations where the dataset is either too small or not fully representative of the problem being solved, and obtaining more data is not possible. This algorithm is of particular interest when training small models that have trouble generalizing. It relies on a trained model with loose accuracy constraints, an iterative hyperparameter pruning procedure, and a function used to generate new data. Analysis on correctness and runtime complexity under ideal conditions and an extension to deep neural networks is provided. In the former case we obtain an asymptotic bound of $O\left(|\Theta^2|\left(\log{|\Theta|} + |\theta^2| + T_f\left(| D|\right)\right) + \bar{S}|\Theta||{E}|\right)$, where $|{\Theta}|$ is the cardinality of the set of hyperparameters $\theta$ to be searched; $|{E}|$ and $|{D}|$ are the sizes of the evaluation and training datasets, respectively; $\bar{S}$ and $\bar{f}$ are the inference times for the trained model and the candidate model; and $T_f({|{D}|})$ is a polynomial on $|{D}|$ and $\bar{f}$. Under these conditions, this algorithm returns a solution that is $1 \leq r \leq 2(1 - {2^{-|{\Theta}|}})$ times better than simply enumerating and training with any $\theta \in \Theta$. As part of our analysis of the generating function we also prove that, under certain assumptions, if an open cover of $D$ has the same homology as the manifold where the support of the underlying probability distribution lies, then $D$ is learnable, and viceversa.
    HyperparameterManifoldTraining setPolynomial timeInferenceVC dimensionStochastic gradient descentBinary classificationDeep Neural NetworksNeural network...
  • The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.
    Shapley valueTraining setObservational errorMachine learningX-rayAlgorithms...
  • We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If the gap (2) is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings, in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is "because" they optimize faster on the population loss in the Ideal World. This suggests our framework is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area.
    ArchitectureStochastic gradient descentOptimizationDeep learningTraining setSchedulingConvolution Neural NetworkNeural networkRegressionStatistics...
  • Many businesses and industries require accurate forecasts for weekly time series nowadays. The forecasting literature however does not currently provide easy-to-use, automatic, reproducible and accurate approaches dedicated to this task. We propose a forecasting method that can be used as a strong baseline in this domain, leveraging state-of-the-art forecasting techniques, forecast combination, and global modelling. Our approach uses four base forecasting models specifically suitable for forecasting weekly data: a global Recurrent Neural Network model, Theta, Trigonometric Box-Cox ARMA Trend Seasonal (TBATS), and Dynamic Harmonic Regression ARIMA (DHR-ARIMA). Those are then optimally combined using a lasso regression stacking approach. We evaluate the performance of our method against a set of state-of-the-art weekly forecasting models on six datasets. Across four evaluation metrics, we show that our method consistently outperforms the benchmark methods by a considerable margin with statistical significance. In particular, our model can produce the most accurate forecasts, in terms of mean sMAPE, for the M4 weekly dataset.
    RegressionTime SeriesMessier 4Recurrent neural networkHorizonMeta learningStatistical significanceRankMessier 3Seasonal lag...
  • With the success of Neural Architecture Search (NAS), weight sharing, as an approach to speed up architecture performance estimation has received wide attention. Instead of training each architecture separately, weight sharing builds a supernet that assembles all the architectures as its submodels. However, there has been debate over whether the NAS process actually benefits from weight sharing, due to the gap between supernet optimization and the objective of NAS. To further understand the effect of weight sharing on NAS, we conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS. Moreover, we take a step forward to explore the pruning based NAS algorithms. Some of our key findings are summarized as: (i) A well-trained supernet is not necessarily a good architecture-ranking model. (ii) Supernet is good at finding relatively good (top-10%) architectures but struggles to find the best ones (top-1% or less). (iii) The effectiveness of supernet largely depends on the design of search space itself. (iv) Comparing to selecting the best architectures, supernet is more confident in pruning the worst ones. (v) It is easier to find better architectures from an effectively pruned search space with supernet training. We expect the observations and insights obtained in this work would inspire and help better NAS algorithm design.
    ArchitectureAttentionAlgorithm designOptimizationRankingAlgorithmsObjective...
  • Despite numerous research efforts, the precise mechanisms of concussion have yet to be fully uncovered. Clinical studies on high-risk populations, such as contact sports athletes, have become more common and give insight on the link between impact severity and brain injury risk through the use of wearable sensors and neurological testing. However, as the number of institutions operating these studies grows, there is a growing need for a platform to share these data to facilitate our understanding of concussion mechanisms and aid in the development of suitable diagnostic tools. To that end, this paper puts forth two contributions: 1) a centralized, open-source platform for storing and sharing head impact data, in collaboration with the Federal Interagency Traumatic Brain Injury Research informatics system (FITBIR), and 2) a deep learning impact detection algorithm (MiGNet) to differentiate between true head impacts and false positives for the previously biomechanically validated instrumented mouthguard sensor (MiG2.0), all of which easily interfaces with FITBIR. We report 96% accuracy using MiGNet, based on a neural network model, improving on previous work based on Support Vector Machines achieving 91% accuracy, on an out of sample dataset of high school and collegiate football head impacts. The integrated MiG2.0 and FITBIR system serve as a collaborative research tool to be disseminated across multiple institutions towards creating a standardized dataset for furthering the knowledge of concussion biomechanics.
    Deep learningNetwork modelSupport vector machineNeural networkAlgorithms...
  • Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of 'better inductive bias'. However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task). Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on $\mathbb{R}^d\times\{\pm 1\}$ on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires $\Omega(d^2)$ samples to generalize while $O(1)$ samples suffice for convolutional architectures. Furthermore, we demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $\Omega(d^2/\varepsilon)$ gap. The proof relies on the fact that SGD on fully-connected network is orthogonal equivariant. Similar results are achieved for $\ell_2$ regression and adaptive training algorithms, e.g. Adam and AdaGrad, which are only permutation equivariant.
    Convolution Neural NetworkPermutationVC dimensionArchitectureGround truthRegressionRegularizationTraining setInductive biasStochastic gradient descent...
  • The purpose of this study was to develop a fully-automated segmentation algorithm, robust to various density enhancing lung abnormalities, to facilitate rapid quantitative analysis of computed tomography images. A polymorphic training approach is proposed, in which both specifically labeled left and right lungs of humans with COPD, and nonspecifically labeled lungs of animals with acute lung injury, were incorporated into training a single neural network. The resulting network is intended for predicting left and right lung regions in humans with or without diffuse opacification and consolidation. Performance of the proposed lung segmentation algorithm was extensively evaluated on CT scans of subjects with COPD, confirmed COVID-19, lung cancer, and IPF, despite no labeled training data of the latter three diseases. Lobar segmentations were obtained using the left and right lung segmentation as input to the LobeNet algorithm. Regional lobar analysis was performed using hierarchical clustering to identify radiographic subtypes of COVID-19. The proposed lung segmentation algorithm was quantitatively evaluated using semi-automated and manually-corrected segmentations in 87 COVID-19 CT images, achieving an average symmetric surface distance of $0.495 \pm 0.309$ mm and Dice coefficient of $0.985 \pm 0.011$. Hierarchical clustering identified four radiographical phenotypes of COVID-19 based on lobar fractions of consolidated and poorly aerated tissue. Lower left and lower right lobes were consistently more afflicted with poor aeration and consolidation. However, the most severe cases demonstrated involvement of all lobes. The polymorphic training approach was able to accurately segment COVID-19 cases with diffuse consolidation without requiring COVID-19 cases for training.
    COVID 19SubtypingTraining setConvolution Neural NetworkHierarchical clusteringOpacityDice's coefficientGround truthImage segmentationGlass...
  • Machine learning methods in drug discovery have primarily focused on virtual screening of molecular libraries using discriminative models. Generative models are an entirely different approach to drug discovery that learn to represent and optimize molecules in a continuous latent space. These methods have already been applied with increasing success to the generation of two dimensional molecules as SMILES strings and molecular graphs. In this work, we describe deep generative models for three dimensional molecular structures using atomic density grids and a novel fitting algorithm that converts continuous grids to discrete molecular structures. Our models jointly represent drug-like molecules and their conformations in a latent space that can be explored through interpolation. We are able to sample diverse sets of molecules based on a given input compound and increase the probability of creating a valid, drug-like molecule.
    Molecular structureGenerative modelLatent spaceGraphDiscriminative modelMachine learningAlgorithmsProbability...
  • In 2012, SEC mandated all corporate filings for any company doing business in US be entered into the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. In this work we are investigating ways to analyze the data available through EDGAR database. This may serve portfolio managers (pension funds, mutual funds, insurance, hedge funds) to get automated insights into companies they invest in, to better manage their portfolios. The analysis is based on Artificial Neural Networks applied to the data.} In particular, one of the most popular machine learning methods, the Convolutional Neural Network (CNN) architecture, originally developed to interpret and classify images, is now being used to interpret financial data. This work investigates the best way to input data collected from the SEC filings into a CNN architecture. We incorporate accounting principles and mathematical methods into the design of three image encoding methods. Specifically, two methods are derived from accounting principles (Sequential Arrangement, Category Chunk Arrangement) and one is using a purely mathematical technique (Hilbert Vector Arrangement). In this work we analyze fundamental financial data as well as financial ratio data and study companies from the financial, healthcare and IT sectors in the United States. We find that using imaging techniques to input data for CNN works better for financial ratio data but is not significantly better than simply using the 1D input directly for fundamental data. We do not find the Hilbert Vector Arrangement technique to be significantly better than other imaging techniques.
    Convolution Neural NetworkArchitectureAutoencoderDeep learningFeature selectionMachine learningMultilayer perceptronTime SeriesSoft protonPortfolio...
  • The thesis explores the role machine learning methods play in creating intuitive computational models of neural processing. Combined with interpretability techniques, machine learning could replace human modeler and shift the focus of human effort to extracting the knowledge from the ready-made models and articulating that knowledge into intuitive descroptions of reality. This perspective makes the case in favor of the larger role that exploratory and data-driven approach to computational neuroscience could play while coexisting alongside the traditional hypothesis-driven approach. We exemplify the proposed approach in the context of the knowledge representation taxonomy with three research projects that employ interpretability techniques on top of machine learning methods at three different levels of neural organization. The first study (Chapter 3) explores feature importance analysis of a random forest decoder trained on intracerebral recordings from 100 human subjects to identify spectrotemporal signatures that characterize local neural activity during the task of visual categorization. The second study (Chapter 4) employs representation similarity analysis to compare the neural responses of the areas along the ventral stream with the activations of the layers of a deep convolutional neural network. The third study (Chapter 5) proposes a method that allows test subjects to visually explore the state representation of their neural signal in real time. This is achieved by using a topology-preserving dimensionality reduction technique that allows to transform the neural data from the multidimensional representation used by the computer into a two-dimensional representation a human can grasp. The approach, the taxonomy, and the examples, present a strong case for the applicability of machine learning methods to automatic knowledge discovery in neuroscience.
    Deep convolutional neural networksMachine learningRandom forestFeature spaceArchitectureTaxonomyKnowledge representationData samplingActivity patternsRegion of interest...
  • Given the wide spread of inaccurate medical advice related to the 2019 coronavirus pandemic (COVID-19), such as fake remedies, treatments and prevention suggestions, misinformation detection has emerged as an open problem of high importance and interest for the NLP community. To combat potential harm of COVID19-related misinformation, we release Covid-HeRA, a dataset for health risk assessment of COVID-19-related social media posts. More specifically, we study the severity of each misinformation story, i.e., how harmful a message believed by the audience can be and what type of signals can be used to discover high malicious fake news and detect refuted claims. We present a detailed analysis, evaluate several simple and advanced classification models, and conclude with our experimental analysis that presents open challenges and future directions.
    COVID 19Decision makingVaccineAttentionConvolution Neural NetworkBinary classificationComputational linguisticsLong short term memoryTwitterBag of words model...
  • Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as input and produces a desired vehicle trajectory as output; there are no distinct internal modules connected by human-designed interfaces. We believe that handcrafted interfaces ultimately limit performance by restricting information flow through the system and that a learned approach, in combination with other artificial intelligence systems that add redundancy, will lead to better overall performing systems. We continue to conduct research toward that goal. This document describes the PilotNet lane-keeping effort, carried out over the past five years by our NVIDIA PilotNet group in Holmdel, New Jersey. Here we present a snapshot of system status in mid-2020 and highlight some of the work done by the PilotNet group.
    Training setNeural networkAutonomous vehiclesSoftwareDeep Neural NetworksInferenceConvolution Neural NetworkRegion of interestGround truthArchitecture...
  • We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT's architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4~5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.
    QuaternionsAttentionEmbeddingArchitectureTwitterLong short term memoryTraining setF1 scoreArea-under-curveConvolution Neural Network...
  • Smart healthcare which is built as healthcare Cyber-Physical System (H-CPS) from Internet-of-Medical-Things (IoMT) is becoming more important than before. Medical devices and their connectivity through Internet with alongwith the electronics health record (EHR) and AI analytics making H-CPS possible. IoMT-end devices like wearables and implantables are key for H-CPS based smart healthcare. Smart garment is a specific wearable which can be used for smart healthcare. There are various smart garments that help users to monitor their body vitals in real-time. Many commercially available garments collect the vital data and transmit it to the mobile application for visualization. However, these don't perform real-time analysis for the user to comprehend their health conditions. Also, such garments are not included with an alert system to alert users and contacts in case of emergency. In MyWear, we propose a wearable body vital monitoring garment that captures physiological data and automatically analyses such heart rate, stress level, muscle activity to detect abnormalities. A copy of the physiological data is transmitted to the cloud for detecting any abnormalities in heart beats and predict any potential heart failure in future. We also propose a deep neural network (DNN) model that automatically classifies abnormal heart beat and potential heart failure. For immediate assistance in such a situation, we propose an alert system that sends an alert message to nearby medical officials. The proposed MyWear has an average accuracy of 96.9% and precision of 97.3% for detection of the abnormalities.
    Cyber-physical systemOrientationDeep Neural NetworksDeep learningRadiative RecombinationIntensitySecurityGraphPrivacyMachine learning...
  • Gait recognition, referring to the identification of individuals based on the manner in which they walk, can be very challenging due to the variations in the viewpoint of the camera and the appearance of individuals. Current methods for gait recognition have been dominated by deep learning models, notably those based on partial feature representations. In this context, we propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules to obtain more discriminative gait features. Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor. It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions using Bi-directional Gated Recurrent Units (BGRU). Finally, a capsule network is adopted to learn deeper part-whole relationships and assigns more weights to the more relevant features while ignoring the spurious dimensions. That way, we obtain final features that are more robust to both viewing and appearance changes. The performance of our method has been extensively tested on two gait recognition datasets, CASIA-B and OU-MVLP, using four challenging test protocols. The results of our method have been compared to the state-of-the-art gait recognition solutions, showing the superiority of our model, notably when facing challenging viewing and carrying conditions.
    AttentionDeep learningSilhouetteFully connected layerArchitectureFeature spaceRecurrent neural networkFeature extractionAblationRank...
  • Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead.
    AttentionGenerative Adversarial NetRecurrent neural networkArchitectureLeast squaresSignal to noise ratioConvolution Neural NetworkDeep Neural NetworksGenerative modelSignal processing...
  • In recent years, machine learning has become prevalent in numerous tasks, including algorithmic trading. Stock market traders utilize learning models to predict the market's behavior and execute an investment strategy accordingly. However, learning models have been shown to be susceptible to input manipulations called adversarial examples. Yet, the trading domain remains largely unexplored in the context of adversarial learning. This is mainly because of the rapid changes in the market which impair the attacker's ability to create a real-time attack. In this study, we present a realistic scenario in which an attacker gains control of an algorithmic trading bots by manipulating the input data stream in real-time. The attacker creates an universal perturbation that is agnostic to the target model and time of use, while also remaining imperceptible. We evaluate our attack on a real-world market data stream and target three different trading architectures. We show that our perturbation can fool the model at future unseen data points, in both white-box and black-box settings. We believe these findings should serve as an alert to the finance community about the threats in this area and prompt further research on the risks associated with using automated learning models in the finance domain.
    MarketStock MarketAdversarial examplesTully-Fisher relationArchitectureMachine learningData samplingDeep Neural NetworksTraining setOpen source...
  • Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this paper, we probe BERT specifically to understand and measure the relational knowledge it captures. We utilize knowledge base completion tasks to probe every layer of pre-trained as well as fine-tuned BERT (ranking, question answering, NER). Our findings show that knowledge is not just contained in BERT's final layers. Intermediate layers contribute a significant amount (17-60%) to the total knowledge found. Probing intermediate layers also reveals how different types of knowledge emerge at varying rates. When BERT is fine-tuned, relational knowledge is forgotten but the extent of forgetting is impacted by the fine-tuning objective but not the size of the dataset. We found that ranking models forget the least and retain more knowledge in their final layer. We release our code on github to repeat the experiments.
    RankingKnowledge baseRankEmbeddingArchitectureGalilean satellitesPart-of-speechRegressionF1 scorePattern recognition...
  • Due to the advancements in cellular technologies and the dense deployment of cellular infrastructure, integrating unmanned aerial vehicles (UAVs) into the fifth-generation (5G) and beyond cellular networks is a promising solution to achieve safe UAV operation as well as enabling diversified applications with mission-specific payload data delivery. In particular, 5G networks need to support three typical usage scenarios, namely, enhanced mobile broadband (eMBB), ultra-reliable low-latency communications (URLLC), and massive machine-type communications (mMTC). On the one hand, UAVs can be leveraged as cost-effective aerial platforms to provide ground users with enhanced communication services by exploiting their high cruising altitude and controllable maneuverability in three-dimensional (3D) space. On the other hand, providing such communication services simultaneously for both UAV and ground users poses new challenges due to the need for ubiquitous 3D signal coverage as well as the strong air-ground network interference. Besides the requirement of high-performance wireless communications, the ability to support effective and efficient sensing as well as network intelligence is also essential for 5G-and-beyond 3D heterogeneous wireless networks with coexisting aerial and ground users. In this paper, we provide a comprehensive overview of the latest research efforts on integrating UAVs into cellular networks, with an emphasis on how to exploit advanced techniques (e.g., intelligent reflecting surface, short packet transmission, energy harvesting, joint communication and radar sensing, and edge intelligence) to meet the diversified service requirements of next-generation wireless systems. Moreover, we highlight important directions for further investigation in future work.
    InterferenceLine of sightOptimizationMobilityComputer network programmingMachine learningEnergy harvestingSignal to noise ratioInternet of ThingsAttention...
  • Segmentation of infected areas in chest CT volumes is of great significance for further diagnosis and treatment of COVID-19 patients. Due to the complex shapes and varied appearances of lesions, a large number of voxel-level labeled samples are generally required to train a lesion segmentation network, which is a main bottleneck for developing deep learning based medical image segmentation algorithms. In this paper, we propose a weakly-supervised lesion segmentation framework by embedding the Generative Adversarial training process into the Segmentation Network, which is called GASNet. GASNet is optimized to segment the lesion areas of a COVID-19 CT by the segmenter, and to replace the abnormal appearance with a generated normal appearance by the generator, so that the restored CT volumes are indistinguishable from healthy CT volumes by the discriminator. GASNet is supervised by chest CT volumes of many healthy and COVID-19 subjects without voxel-level annotations. Experiments on three public databases show that when using as few as one voxel-level labeled sample, the performance of GASNet is comparable to fully-supervised segmentation algorithms trained on dozens of voxel-level labeled samples.
    COVID 19Generative Adversarial NetImage segmentationGround truthEmbeddingDeep learningHyperparameterSynthetic DataRegion of interestAttention...
  • With the growing processing power of computing systems and the increasing availability of massive datasets, machine learning algorithms have led to major breakthroughs in many different areas. This development has influenced computer security, spawning a series of work on learning-based security systems, such as for malware detection, vulnerability discovery, and binary code analysis. Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance and render learning-based systems potentially unsuitable for security tasks and practical deployment. In this paper, we look at this problem with critical eyes. First, we identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We conduct a longitudinal study of 30 papers from top-tier security conferences within the past 10 years, confirming that these pitfalls are widespread in the current security literature. In an empirical analysis, we further demonstrate how individual pitfalls can lead to unrealistic performance and interpretations, obstructing the understanding of the security problem at hand. As a remedy, we derive a list of actionable recommendations to support researchers and our community in avoiding pitfalls, promoting a sound design, development, evaluation, and deployment of learning-based systems for computer security.
    SecurityMachine learningTraining setAutoencoderSupport vector machineReceiver operating characteristicMarketHyperparameterProgrammingLongitudinal study...
  • In recent, deep learning has become the most popular direction in machine learning and artificial intelligence. However, preparation of training data is often a bottleneck in the lifecycle of deploying a deep learning model for production or research. Reusing models for inferencing a dataset can greatly save the human costs required for training data creation. Although there exist a number of model sharing platform such as TensorFlow Hub, PyTorch Hub, DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. They are in lack of an automatic model searching tool. This paper proposes an end-to-end process of searching related models for serving based on the similarity of the target dataset and the training datasets of the available models. While there exist many similarity measurements, we study how to efficiently apply these metrics without pair-wise comparison and compare the effectiveness of these metrics. We find that our proposed adaptivity measurement which is based on Jensen-Shannon (JS) divergence, is an effective measurement, and its computation can be significantly accelerated by using the technique of locality sensitive hashing.
    Training setMinHashDeep learningFeature spacePearson's correlationArchitectureModel selectionKeyphraseActivity recognitionEllipticity...
  • Billions of USD are invested in new artists and songs by the music industry every year. This research provides a new strategy for assessing the hit potential of songs, which can help record companies support their investment decisions. A number of models were developed that use both audio data, and a novel feature based on social media listening behaviour. The results show that models based on early adopter behaviour perform well when predicting top 20 dance hits.
    Logistic regressionApplication programming interfaceSupport vector machineMeta FeatureRankingWekaPotentialField...
  • A crucial aspect for the successful deployment of audio-based models "in-the-wild" is the robustness to the transformations introduced by heterogeneous acquisition conditions. In this work, we propose a method to perform one-shot microphone style transfer. Given only a few seconds of audio recorded by a target device, MicAugment identifies the transformations associated to the input acquisition pipeline and uses the learned transformations to synthesize audio as if it were recorded under the same conditions as the target audio. We show that our method can successfully apply the style transfer to real audio and that it significantly increases model robustness when used as data augmentation in the downstream tasks.
    Power spectral densityReverberationConvolution Neural NetworkSignal processingKeyphraseTraining setFully connected layerSpeech recognitionOptimizationCalibration...
  • Safe and proactive planning in robotic systems generally requires accurate predictions of the environment. Prior work on environment prediction applied video frame prediction techniques to bird's-eye view environment representations, such as occupancy grids. ConvLSTM-based frameworks used previously often result in significant blurring and vanishing of moving objects, thus hindering their applicability for use in safety-critical applications. In this work, we propose two extensions to the ConvLSTM to address these issues. We present the Temporal Attention Augmented ConvLSTM (TAAConvLSTM) and Self-Attention Augmented ConvLSTM (SAAConvLSTM) frameworks for spatiotemporal occupancy prediction, and demonstrate improved performance over baseline architectures on the real-world KITTI and Waymo datasets.
    AttentionArchitectureHorizonMean squared errorRecurrent neural networkLong short term memoryRoboticsLiDARDecision makingHyperparameter...
  • Image forensic plays a crucial role in both criminal investigations (e.g., dissemination of fake images to spread racial hate or false narratives about specific ethnicity groups) and civil litigation (e.g., defamation). Increasingly, machine learning approaches are also utilized in image forensics. However, there are also a number of limitations and vulnerabilities associated with machine learning-based approaches, for example how to detect adversarial (image) examples, with real-world consequences (e.g., inadmissible evidence, or wrongful conviction). Therefore, with a focus on image forensics, this paper surveys techniques that can be used to enhance the robustness of machine learning-based binary manipulation detectors in various adversarial scenarios.
    Machine learningSecurityAdversarial examplesConvolution Neural NetworkGenerative Adversarial NetSupport vector machineArchitectureTraining setOrder statisticFeature selection...
  • Interpretability methods for neural networks are difficult to evaluate because we do not understand the black-box models typically used to test them. This paper proposes a framework in which interpretability methods are evaluated using manually constructed networks, which we call white-box networks, whose behavior is understood a priori. We evaluate five methods for producing attribution heatmaps by applying them to white-box LSTM classifiers for tasks based on formal languages. Although our white-box classifiers solve their tasks perfectly and transparently, we find that all five attribution methods fail to produce the expected model explanations.
    Long short term memorySoft protonCountingFormal languagesAblationNeural networkHidden stateComputational linguisticsActivation functionGround truth...
  • The surge in the spread of misleading information, lies, propaganda, and false facts, frequently known as fake news, raised questions concerning social media's influence in today's fast-moving democratic society. The widespread and rapid dissemination of fake news cost us in many ways. For example, individual or societal costs by hampering elections integrity, significant economic losses by impacting stock markets, or increases the risk to national security. It is challenging to overcome the spreading of fake news problems in traditional centralized systems. However, Blockchain-- a distributed decentralized technology that ensures data provenance, authenticity, and traceability by providing a transparent, immutable, and verifiable transaction records can help in detecting and contending fake news. This paper proposes a novel hybrid model DeHiDe: Deep Learning-based Hybrid Model to Detect Fake News using Blockchain. The DeHiDe is a blockchain-based framework for legitimate news sharing by filtering out the fake news. It combines the benefit of blockchain with an intelligent deep learning model to reinforce robustness and accuracy in combating fake news's hurdle. It also compares the proposed method to existing state-of-the-art methods. The DeHiDe is expected to outperform state-of-the-art approaches in terms of services, features, and performance.
    Deep learningConvolution Neural NetworkSecurityArchitectureComputational linguisticsWord embeddingStock MarketRecurrent neural networkSocial networkAttention...