%0 %0 Conference Proceedings %A Nivarthi, Chandana Priya & Sick, Bernhard %D 2023 %T Towards Few-Shot Time Series Anomaly Detection with Temporal Attention and Dynamic Thresholding %E %B International Conference on Machine Learning and Applications (ICMLA) %C %I IEEE %V %6 %N %P 1444--1450 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F nivarthi2023towards %K imported, itegpub, isac-www, few-shot, learning, anomaly, detection, temporal, attention, dynamic, thresholding %X Anomaly detection plays a pivotal role in diverse realworld applications such as cybersecurity, fault detection, network monitoring, predictive maintenance, and highly automated driving. However, obtaining labeled anomalous data can be a formidable challenge, especially when anomalies exhibit temporal evolution. This paper introduces LATAM (Long short-term memory Autoencoder with Temporal Attention Mechanism) for few-shot anomaly detection, with the aim of enhancing detection performance in scenarios with limited labeled anomaly data. LATAM effectively captures temporal dependencies and emphasizes significant patterns in multivariate time series data. In our investigation, we comprehensively evaluate LATAM against other anomaly detection models, particularly assessing its capability in few-shot learning scenarios where we have minimal examples from the normal class and none from the anomalous class in the training data. Our experimental results, derived from real-world photovoltaic inverter data, highlight LATAM’s superiority, showcasing a substantial 27% mean F1 score improvement, even when trained on a mere two-week dataset. Furthermore, LATAM demonstrates remarkable results on the open-source SWaT dataset, achieving a 12% boost in accuracy with only two days of training data. Moreover, we introduce a simple yet effective dynamic thresholding mechanism, further enhancing the anomaly detection capabilities of LATAM. This underscores LATAM’s efficacy in addressing the challenges posed by limited labeled anomalies in practical scenarios and it proves valuable for downstream tasks involving temporal representation and time series prediction, extending its utility beyond anomaly detection applications. %Z %U %+ %^ %0 %0 Conference Proceedings %A Mitchell, T.; Cohen, W.; Hruscha, E.; Talukdar, P.; Betteridge, J.; Carlson, A.; Dalvi, B.; Gardner, M.; Kisiel, B.; Krishnamurthy, J.; Lao, N.; Mazaitis, K.; Mohammad, T.; Nakashole, N.; Platanios, E.; Ritter, A.; Samadi, M.; Settles, B.; Wang, R.; Wijaya, D.; Gupta, A.; Chen, X.; Saparov, A.; Greaves, M. & Welling, J. %D 2015 %T Never-Ending Learning %E %B AAAI %C %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 Papers by William W. Cohen %3 inproceedings %4 %# %$ %F mitchell2015 %K learning, nell, ontology, semantic, toread %X %Z : Never-Ending Learning in AAAI-2015 %U http://www.cs.cmu.edu/~wcohen/pubs.html %+ %^ %0 %0 Journal Article %A Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex; Riedmiller, Martin; Fidjeland, Andreas K.; Ostrovski, Georg; Petersen, Stig; Beattie, Charles; Sadik, Amir; Antonoglou, Ioannis; King, Helen; Kumaran, Dharshan; Wierstra, Daan; Legg, Shane & Hassabis, Demis %D 2015 %T Human-level control through deep reinforcement learning %E %B Nature %C %I Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved. %V 518 %6 %N 7540 %P 529--533 %& %Y %S %7 %8 February %9 %? %! %Z %@ 00280836 %( %) %* %L %M %1 %2 Human-level control through deep reinforcement learning - nature14236.pdf %3 article %4 %# %$ %F mnih2015humanlevel %K deep, learning, toread %X %Z %U http://dx.doi.org/10.1038/nature14236 %+ %^ %0 %0 Conference Proceedings %A Ring, Markus; Otto, Florian; Becker, Martin; Niebler, Thomas; Landes, Dieter & Hotho, Andreas %D 2015 %T ConDist: A Context-Driven Categorical Distance Measure %E ECMLPKDD2015 %B %C %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F ring2015condist %K 2015, categorical, data, learning, measure, myown, similarity, unsupervised %X %Z %U %+ %^ %0 %0 Conference Proceedings %A Krompass, Denis; Nickel, Maximilian & Tresp, Volker %D 2014 %T Large-scale factorization of type-constrained multi-relational data %E %B International Conference on Data Science and Advanced Analytics, {DSAA} 2014, Shanghai, China, October 30 - November 1, 2014 %C %I {IEEE} %V %6 %N %P 18--24 %& %Y %S %7 %8 %9 %? %! %Z %@ 978-1-4799-6991-3 %( %) %* %L %M %1 %2 dblp: BibTeX record conf/dsaa/KrompassNT14 %3 inproceedings %4 DBLP:conf/dsaa/2014 %# %$ %F DBLP:conf/dsaa/KrompassNT14 %K graph, knowledge, learning, toread %X %Z %U http://dx.doi.org/10.1109/DSAA.2014.7058046 %+ %^ %0 %0 Book Section %A Lehmann, Jens & Voelker, Johanna %D 2014 %T An Introduction to Ontology Learning %E Lehmann, Jens & Voelker, Johanna %B Perspectives on Ontology Learning %C %I AKA / IOS Press %V %6 %N %P ix-xvi %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 incollection %4 %# jl %$ %F pol_introduction %K introduction, learning, ontology %X %Z %U http://jens-lehmann.org/files/2014/pol_introduction.pdf %+ %^ %0 %0 Conference Proceedings %A Balasubramanyan, Ramnath; Dalvi, Bhavana Bharat & Cohen, William W. %D 2013 %T From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering. %E Blockeel, Hendrik; Kersting, Kristian; Nijssen, Siegfried & Zelezný, Filip %B ECML/PKDD (2) %C %I Springer %V 8189 %6 %N %P 628-642 %& %Y %S Lecture Notes in Computer Science %7 %8 %9 %? %! %Z %@ 978-3-642-40990-5 %( %) %* %L %M %1 %2 %3 inproceedings %4 conf/pkdd/2013-2 %# %$ %F conf/pkdd/BalasubramanyanDC13 %K learning, models, sota, supervised, topic, toread %X %Z %U http://dblp.uni-trier.de/db/conf/pkdd/pkdd2013-2.html#BalasubramanyanDC13 %+ %^ %0 %0 Conference Proceedings %A Bitzer, Philipp & Söllner, Matthias %D 2013 %T Towards a Productivity Measurement Model for Technology Mediated Learning Services %E %B European Conference on Information Systems (ECIS) %C Utrecht, Netherlands (accepted for publication) %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F ls_leimeister %K Technology-mediated, itegpub, learning, productivity, pub_msö, pub_pbi, service, services, success, training %X %Z %U %+ %^ %0 %0 Conference Proceedings %A Bitzer, Philipp; Weiß, Frank & Leimeister, Jan Marco %D 2013 %T Towards a Reference Model for a Productivity-optimized Delivery of Technology Mediated %E %B Eighth International Conference on Design Science Research in Information Systems and Technology (DESRIST) %C Helsinki, Finland (accepted for publication) %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F ls_leimeister %K delivery, itegpub, learning, mediated, model, productivity, pub_jml, pub_pbi, reference, service, technology %X %Z %U %+ %^ %0 %0 Journal Article %A Kluegl, Peter; Toepfer, Martin; Lemmerich, Florian; Hotho, Andreas & Puppe, Frank %D 2013 %T Exploiting Structural Consistencies with Stacked Conditional Random Fields %E %B Mathematical Methodologies in Pattern Recognition and Machine Learning Springer Proceedings in Mathematics & Statistics %C %I %V 30 %6 %N %P 111-125 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 article %4 %# %$ %F kluegl2013exploiting %K 2013, ie, learning, myown, references %X Conditional Random Fields (CRF) are popular methods for labeling unstructured or textual data. Like many machine learning approaches, these undirected graphical models assume the instances to be independently distributed. However, in real-world applications data is grouped in a natural way, e.g., by its creation context. The instances in each group often share additional structural consistencies. This paper proposes a domain-independent method for exploiting these consistencies by combining two CRFs in a stacked learning framework. We apply rule learning collectively on the predictions of an initial CRF for one context to acquire descriptions of its specific properties. Then, we utilize these descriptions as dynamic and high quality features in an additional (stacked) CRF. The presented approach is evaluated with a real-world dataset for the segmentation of references and achieves a significant reduction of the labeling error. %Z %U %+ %^ %0 %0 Generic %A Yu, Hsiang-Fu; Jain, Prateek; Kar, Purushottam & Dhillon, Inderjit S. %D 2013 %T Large-scale Multi-label Learning with Missing Labels %E %B %C %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 Large-scale Multi-label Learning with Missing Labels %3 misc %4 %# %$ %F yu2013largescale %K classification, kallimachos, label, large, learning, multi %X The multi-label classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) the ability to tackle problems with a large number (say millions) of labels, and (b) the ability to handle data with missing labels. In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. Our framework, despite being simple, is surprisingly able to encompass several recent label-compression based methods which can be derived as special cases of our method. To optimize the ERM problem, we develop techniques that exploit the structure of specific loss functions - such as the squared loss function - to offer efficient algorithms. We further show that our learning framework admits formal excess risk bounds even in the presence of missing labels. Our risk bounds are tight and demonstrate better generalization performance for low-rank promoting trace-norm regularization when compared to (rank insensitive) Frobenius norm regularization. Finally, we present extensive empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods and can scale up to very large datasets such as the Wikipedia dataset. %Z cite arxiv:1307.5101 %U http://arxiv.org/abs/1307.5101 %+ %^ %0 %0 Journal Article %A Wegener, R. & Leimeister, J. M. %D 2012 %T Virtual Learning Communities: Success Factors and Challenges %E %B International Journal of Technology Enhanced Learning (IJTEL) %C %I %V 4 %6 %N 5/6 %P 383 - 397 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 article %4 %# %$ %F ls_leimeister %K Challenges, Communities:, Factors, Learning, Success, VirtualCommunity, and, itegpub, pub_jml, pub_rwe %X %Z JML_390 %U %+ %^ %0 %0 Conference Proceedings %A Coates, A.; Lee, H. & Ng, A.Y. %D 2011 %T An analysis of single-layer networks in unsupervised feature learning %E Gordon, Geoffrey; Dunson, David & Dudík, Miroslav %B Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics %C %I JMLR W\&CP %V 15 %6 %N %P 215--223 %& %Y %S JMLR Workshop and Conference Proceedings %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F coates2011analysis %K feature, learning, machine, ml, unsupervised %X A great deal of research has focused on algorithms for learning features from unlabeled data. Indeed, much progress has been made on benchmark datasets like NORB and CIFAR-10 by employing increasingly complex unsupervised learning algorithms and deep models. In this paper, however, we show that several simple factors, such as the number of hidden nodes in the model, may be more important to achieving high performance than the learning algorithm or the depth of the model. Specifically, we will apply several off-the-shelf feature learning algorithms (sparse auto-encoders, sparse RBMs, K-means clustering, and Gaussian mixtures) to CIFAR-10, NORB, and STL datasets using only single-layer networks. We then present a detailed analysis of the effect of changes in the model setup: the receptive field size, number of hidden nodes (features), the step-size ("stride") between extracted features, and the effect of whitening. Our results show that large numbers of hidden nodes and dense feature extraction are critical to achieving high performance - so critical, in fact, that when these parameters are pushed to their limits, we achieve state-of-the-art performance on both CIFAR-10 and NORB using only a single layer of features. More surprisingly, our best performance is based on K-means clustering, which is extremely fast, has no hyper-parameters to tune beyond the model structure itself, and is very easy to implement. Despite the simplicity of our system, we achieve accuracy beyond all previously published results on the CIFAR-10 and NORB datasets (79.6% and 97.2% respectively). %Z %U http://jmlr.csail.mit.edu/proceedings/papers/v15/coates11a.html %+ %^ %0 %0 Conference Proceedings %A Coates, A.; Carpenter, B.; Case, C.; Satheesh, S.; Suresh, B.; Wang, Tao; Wu, D.J. & Ng, A.Y. %D 2011 %T Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning %E %B International Conference on Document Analysis and Recognition (ICDAR) %C %I %V %6 %N %P 440--445 %& %Y %S %7 %8 September %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F coates2011detection %K feature, learning, machine, ml, ocr %X Reading text from photographs is a challenging problem that has received a significant amount of attention. Two key components of most systems are (i) text detection from images and (ii) character recognition, and many recent methods have been proposed to design better feature representations and models for both. In this paper, we apply methods recently developed in machine learning -- specifically, large-scale algorithms for learning the features automatically from unlabeled data -- and show that they allow us to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system. %Z %U http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6065350&tag=1 %+ %^ %0 %0 Conference Proceedings %A Carlson, A.; Betteridge, J.; Kisiel, B.; Settles, B.; Jr., E.R. Hruschka & Mitchell, T.M. %D 2010 %T Toward an Architecture for Never-Ending Language Learning %E %B Proceedings of the Conference on Artificial Intelligence (AAAI) %C %I AAAI Press %V %6 %N %P 1306--1313 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F Carlson10 %K learning, nell, sota, web %X %Z %U %+ %^ %0 %0 Conference Proceedings %A Mirowski, Piotr; Ranzato, Marc'Aurelio & LeCun, Yann %D 2010 %T Dynamic Auto-Encoders for Semantic Indexing %E of the NIPS 2010 Workshop on Deep Learning, Proceedings %B %C %I %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 Neuer Tab %3 inproceedings %4 %# %$ %F noauthororeditor %K deep, kallimachos, lda, learning, model, toread %X %Z %U http://yann.lecun.com/exdb/publis/pdf/mirowski-nipsdl-10.pdf %+ %^ %0 %0 Book %A Mitchell, Tom M. %D 2010 %T Machine learning %E %B %C New York, NY [u.a. %I McGraw-Hill %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ 0071154671 9780071154673 %( %) %* %L %M %1 %2 Machine Learning: Tom M. Mitchell: 9780070428072: Amazon.com: Books %3 book %4 %# %$ %F mitchell2010machine %K Mitchell, book, info2.0, learning, machine %X %Z %U http://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/0070428077 %+ %^ %0 %0 Journal Article %A Cimiano, Philipp; Hotho, Andreas & Staab, Steffen %D 2005 %T Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis %E %B Journal on Artificial Intelligence Research %C %I %V 24 %6 %N %P 305-339 %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 article %4 %# %$ %F cimiano05learning %K 2005, fca, folksonomy, hierarchies, hierarchy, learning, myown, ontologies, text %X %Z %U http://dblp.uni-trier.de/db/journals/jair/jair24.html#CimianoHS05 %+ %^ %0 %0 Journal Article %A Breiman, Leo %D 2001 %T Random Forests %E %B Machine Learning %C %I Kluwer Academic Publishers %V 45 %6 %N 1 %P 5-32 %& %Y %S %7 %8 %9 %? %! %Z %@ 0885-6125 %( %) %* %L %M %1 %2 Random Forests - Springer %3 article %4 %# %$ %F breiman2001random %K classification, ensemble, forest, learning, random %X Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to %Z %U http://dx.doi.org/10.1023/A%3A1010933404324 %+ %^ %0 %0 Conference Proceedings %A Joachims, Thorsten %D 1999 %T Making Large-Scale SVM Learning Practical %E Sch\"olkopf, Bernhard; Burges, Christopher J.C. & Smola, A. %B Advances in Kernel Methods - Support Vector Learning %C Cambridge, MA, USA %I MIT Press %V %6 %N %P %& %Y %S %7 %8 %9 %? %! %Z %@ %( %) %* %L %M %1 %2 %3 inproceedings %4 %# %$ %F joachims99 %K kernel, learning, svm %X %Z %U %+ %^