Han Xiao

Berlin, Berlin, Deutschland Kontaktinformationen
20.261 Follower:innen 500+ Kontakte

Anmelden, um das Profil zu sehen

Info

Dr. Han Xiao is the Founder & CEO of Jina AI, a commercial opensource company based in…

Aktivitäten

Anmelden, um alle Aktivitäten zu sehen

Berufserfahrung und Ausbildung

  • 极纳科技

Gesamte Berufserfahrung von Han Xiao anzeigen

Jobbezeichnung, Beschäftigungsdauer und mehr ansehen.

oder

Wenn Sie auf „Weiter“ klicken, um Mitglied zu werden oder sich einzuloggen, stimmen Sie der Nutzervereinbarung, der Datenschutzrichtlinie und der Cookie-Richtlinie von LinkedIn zu.

Veröffentlichungen

  • Learning Better while Sending Less: Communication-Efficient Online Semi-Supervised Learning in Client-Server Settings

    IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015)

    We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model…

    We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model of all the client data from the labeled and unlabeled data it receives. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. However, previous approaches are not designed for the client-server setting and do not hold the promise of reducing communication costs. We present a novel framework for solving this learning prob- lem in an effective and communication-efficient manner. On the server side, our solution combines two diverse learners working collaboratively, yet in distinct roles, on the partially labeled data stream. A compact, online graph-based semi-supervised learner is used to predict labels for the unlabeled data arriving from the clients. Samples from this model are used as ongoing training for a linear classifier. On the client side, our solution prioritizes data based on an active-learning metric that favors instances that are close to the classifier’s decision hyperplane and yet far from each other. To reduce communication, the server sends the classifier’s weight-vector to the client only periodically. Experimental results on real-world data sets show that this particular combination of techniques outperforms other approaches, and in particular, often outperforms (communication expensive) approaches that send all the data to the server.

    Andere Autor:innen
  • Support Vector Machines under Adversarial Label Contamination

    Journal of Neurocomputing, Special Issue on Advances in Learning with Label Noise

    Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to…

    Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to successfully operate in adversarial settings. In this work, we evaluate the security of Support Vector Machines (SVMs) to well-crafted, adversarial label noise attacks. In particular, we consider an attacker that aims to maximize the SVM’s classification error by flipping a number of labels in the training data. We formalize a corresponding optimal attack strategy, and solve it by means of heuristic approaches to keep the computational complexity tractable. We report an extensive experimental analysis on the effectiveness of the considered attacks against linear and non-linear SVMs, both on synthetic and real-world datasets. We finally argue that our approach can also provide useful insights for developing more secure SVM learning algorithms, and also novel techniques in a number of related research areas, such as semi-supervised and active learning.

    Andere Autor:innen
    Veröffentlichung anzeigen
  • From Adversarial Learning to Reliable and Scalable Learning

    Dissertation (magna cum laude)

    Nowadays machine learning is considered as a vital tool for data analysis and automatic decision making in many modern enterprise systems. However, there is an emerging threat that adversaries can mislead the decision of the learning algo- rithm by introducing security faults into the system. Previous security research did not closely examined the vulnerabilities of the learning algorithms to adversarial manipulations. Understanding these threats is the only way to build robust learn- ing…

    Nowadays machine learning is considered as a vital tool for data analysis and automatic decision making in many modern enterprise systems. However, there is an emerging threat that adversaries can mislead the decision of the learning algo- rithm by introducing security faults into the system. Previous security research did not closely examined the vulnerabilities of the learning algorithms to adversarial manipulations. Understanding these threats is the only way to build robust learn- ing algorithms for security-sensitive applications. This dissertation is organized in three parts. Each part contributes the new results in adversarial, reliable and scalable machine learning, respectively.

  • Efficient Online Sequence Prediction with Side Information

    ICDM 2013

    Sequence prediction is a key task in machine learning and data mining. It
    involves predicting the next symbol in a sequence given its previous symbols.
    Our motivating application is predicting the execution path of a process on an
    operating system in real-time. In this case, each symbol in the sequence
    represents a system call accompanied with arguments and a return value. We
    propose a novel online algorithm for predicting the next system call by
    leveraging both…

    Sequence prediction is a key task in machine learning and data mining. It
    involves predicting the next symbol in a sequence given its previous symbols.
    Our motivating application is predicting the execution path of a process on an
    operating system in real-time. In this case, each symbol in the sequence
    represents a system call accompanied with arguments and a return value. We
    propose a novel online algorithm for predicting the next system call by
    leveraging both context and side information. The online update of our
    algorithm is efficient in terms of time cost and memory consumption.
    Experiments on real-world data sets showed that our method outperforms
    state-of-the-art online sequence prediction methods in both accuracy and
    efficiency, and incorporation of side information does significantly improve
    the predictive accuracy.

    Andere Autor:innen
    • Claudia Eckert
    Veröffentlichung anzeigen
  • Lazy Gaussian Process Committee for Real-Time Online Regression

    AAAI 2013

    A significant problem of Gaussian process (GP) is its
    unfavorable scaling with a large amount of data. To overcome this issue, we
    present a novel GP approximation scheme for online regression. Our model is
    based on a combination of multiple GPs with random hyperparameters. The model is
    trained by incrementally allocating new examples to a selected subset of GPs.
    The selection is carried out efficiently by optimizing a submodular function.
    Experiments on real-world data sets…

    A significant problem of Gaussian process (GP) is its
    unfavorable scaling with a large amount of data. To overcome this issue, we
    present a novel GP approximation scheme for online regression. Our model is
    based on a combination of multiple GPs with random hyperparameters. The model is
    trained by incrementally allocating new examples to a selected subset of GPs.
    The selection is carried out efficiently by optimizing a submodular function.
    Experiments on real-world data sets showed that our method outperforms existing
    online GP regression methods in both accuracy and efficiency. The applicability
    of the proposed method is demonstrated by the mouse-trajectory prediction in an
    Internet banking scenario.

    Andere Autor:innen
    • Claudia Eckert
    Veröffentlichung anzeigen

Auszeichnungen/Preise

  • Finalist of Chunhui Venture Competition

    Chinese Ministry of Education

    We build an AI-backed marketplace connecting private investors with their best matching quants and strategies. We are a party venue for quants' intellectual quest and exposition, as well as a rewarding feast to fulfill private investors' financial expectations. Techniques such as backtesting, recommendation, search and chatbot are heavily used.

  • Runner-up in Quantopian Open Contest #3

    Quantopian (see https://www.quantopian.com/leaderboard/3)

    Developed an online portfolio balancing algorithm that achieves 50% annual return in live paper trading. In April 2015, my algorithm was ranked at 2nd place out of more than 300 algorithms in paper trading.

    Details can be found here:
    https://www.quantopian.com/leaderboard/3

  • 31337 & Audience Award

    Zalando

    Developed Zketch during Zalando's hack week, a search engine retrieves products to match with a hand-drawn sketch query.

    31337: Awarded for the most geeky project in terms of nasty technical challenges solved, extreme difficulty, use of low-level programming language or extreme networking hacking etc.

    Audience Award: After every presentation we do a "clapping session" and measure the noise level in dB. The team with the loudest applause wins.

  • Chinese Government Award for Outstanding Ph.D. Students Abroad

    Chinese Ministry of Education

    The award recognizes the academic excellence of Chinese Ph.D. students studying oversea. It is granted across all fields of study. In 2013, 40 Chinese Ph.D. students in Germany were awarded.

    http://www.in.tum.de/fuer-studierende/aktuelles/detail/newsarticle/chinesische-regierung-zeichnet-informatik-doktorand-aus.html

  • Student travel award

    Association for the Advancement of Artificial Intelligence (AAAI), 2013

    This award recognizes the paper "Lazy Gaussian Process Committee for Real-Time Online Regression"

  • Best paper award

    ACM SIGKDD Workshop on Knowledge Discovery, Modeling, and Simulation 2011

    This award is recognizes the paper "Supervised Topic Transition Model for Detecting Malicious System Call Sequences".

    The sponsor of this award is Science Applications International Corporation (SAIC).

  • ImagineCup Runner-up

    Microsoft China

    Participated ImagineCup software design competition (China region) with Xu Pu. We used C# and .NET techniques and developed a sound-based interface for visual disabilities to gain better control of their PC.

    The award is issued by Bill Gates himself at Peking University, 2007.

Organisationen

  • Association for the Advancement of Artificial Intelligence (AAAI)

    Student member

Weitere Aktivitäten von Han Xiao

Han Xiaos vollständiges Profil ansehen

  • Herausfinden, welche gemeinsamen Kontakte Sie haben
  • Sich vorstellen lassen
  • Han Xiao direkt kontaktieren
Mitglied werden. um das vollständige Profil zu sehen

Weitere ähnliche Profile

Weitere Mitglieder namens Han Xiao in Deutschland

Entwickeln Sie mit diesen Kursen neue Kenntnisse und Fähigkeiten