Berlin, Berlin, Deutschland
Kontaktinformationen
20.261 Follower:innen
500+ Kontakte
Info
Aktivitäten
-
We can all learn from Siemens! Jina AI was invited to the tangere Events by Siemens. We looked at the main challenges they pointed out in…
We can all learn from Siemens! Jina AI was invited to the tangere Events by Siemens. We looked at the main challenges they pointed out in…
Beliebt bei Han Xiao
-
Do you sometimes feel like fighting against your LLM? 🥊 (yes of course you do) Let me introduce you to 𝐀𝐢𝐤𝐢𝐝𝐨–𝐩𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 (paradigm we…
Do you sometimes feel like fighting against your LLM? 🥊 (yes of course you do) Let me introduce you to 𝐀𝐢𝐤𝐢𝐝𝐨–𝐩𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 (paradigm we…
Beliebt bei Han Xiao
Berufserfahrung und Ausbildung
Veröffentlichungen
-
Learning Better while Sending Less: Communication-Efficient Online Semi-Supervised Learning in Client-Server Settings
IEEE International Conference on Data Science and Advanced Analytics (IEEE DSAA'2015)
We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model…
We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model of all the client data from the labeled and unlabeled data it receives. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. However, previous approaches are not designed for the client-server setting and do not hold the promise of reducing communication costs. We present a novel framework for solving this learning prob- lem in an effective and communication-efficient manner. On the server side, our solution combines two diverse learners working collaboratively, yet in distinct roles, on the partially labeled data stream. A compact, online graph-based semi-supervised learner is used to predict labels for the unlabeled data arriving from the clients. Samples from this model are used as ongoing training for a linear classifier. On the client side, our solution prioritizes data based on an active-learning metric that favors instances that are close to the classifier’s decision hyperplane and yet far from each other. To reduce communication, the server sends the classifier’s weight-vector to the client only periodically. Experimental results on real-world data sets show that this particular combination of techniques outperforms other approaches, and in particular, often outperforms (communication expensive) approaches that send all the data to the server.
Andere Autor:innen -
Support Vector Machines under Adversarial Label Contamination
Journal of Neurocomputing, Special Issue on Advances in Learning with Label Noise
Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to…
Machine learning algorithms are increasingly being applied in security-related tasks such as spam and malware detection, although their security properties against deliberate attacks have not yet been widely understood. Intelligent and adaptive attackers may indeed exploit specific vulnerabilities exposed by machine learning techniques to violate system security. Being robust to adversarial data manipulation is thus an important, additional requirement for machine learning algorithms to successfully operate in adversarial settings. In this work, we evaluate the security of Support Vector Machines (SVMs) to well-crafted, adversarial label noise attacks. In particular, we consider an attacker that aims to maximize the SVM’s classification error by flipping a number of labels in the training data. We formalize a corresponding optimal attack strategy, and solve it by means of heuristic approaches to keep the computational complexity tractable. We report an extensive experimental analysis on the effectiveness of the considered attacks against linear and non-linear SVMs, both on synthetic and real-world datasets. We finally argue that our approach can also provide useful insights for developing more secure SVM learning algorithms, and also novel techniques in a number of related research areas, such as semi-supervised and active learning.
Andere Autor:innenVeröffentlichung anzeigen -
From Adversarial Learning to Reliable and Scalable Learning
Dissertation (magna cum laude)
Nowadays machine learning is considered as a vital tool for data analysis and automatic decision making in many modern enterprise systems. However, there is an emerging threat that adversaries can mislead the decision of the learning algo- rithm by introducing security faults into the system. Previous security research did not closely examined the vulnerabilities of the learning algorithms to adversarial manipulations. Understanding these threats is the only way to build robust learn- ing…
Nowadays machine learning is considered as a vital tool for data analysis and automatic decision making in many modern enterprise systems. However, there is an emerging threat that adversaries can mislead the decision of the learning algo- rithm by introducing security faults into the system. Previous security research did not closely examined the vulnerabilities of the learning algorithms to adversarial manipulations. Understanding these threats is the only way to build robust learn- ing algorithms for security-sensitive applications. This dissertation is organized in three parts. Each part contributes the new results in adversarial, reliable and scalable machine learning, respectively.
-
Efficient Online Sequence Prediction with Side Information
ICDM 2013
Sequence prediction is a key task in machine learning and data mining. It
involves predicting the next symbol in a sequence given its previous symbols.
Our motivating application is predicting the execution path of a process on an
operating system in real-time. In this case, each symbol in the sequence
represents a system call accompanied with arguments and a return value. We
propose a novel online algorithm for predicting the next system call by
leveraging both…Sequence prediction is a key task in machine learning and data mining. It
involves predicting the next symbol in a sequence given its previous symbols.
Our motivating application is predicting the execution path of a process on an
operating system in real-time. In this case, each symbol in the sequence
represents a system call accompanied with arguments and a return value. We
propose a novel online algorithm for predicting the next system call by
leveraging both context and side information. The online update of our
algorithm is efficient in terms of time cost and memory consumption.
Experiments on real-world data sets showed that our method outperforms
state-of-the-art online sequence prediction methods in both accuracy and
efficiency, and incorporation of side information does significantly improve
the predictive accuracy.Andere Autor:innen -
-
Lazy Gaussian Process Committee for Real-Time Online Regression
AAAI 2013
A significant problem of Gaussian process (GP) is its
unfavorable scaling with a large amount of data. To overcome this issue, we
present a novel GP approximation scheme for online regression. Our model is
based on a combination of multiple GPs with random hyperparameters. The model is
trained by incrementally allocating new examples to a selected subset of GPs.
The selection is carried out efficiently by optimizing a submodular function.
Experiments on real-world data sets…A significant problem of Gaussian process (GP) is its
unfavorable scaling with a large amount of data. To overcome this issue, we
present a novel GP approximation scheme for online regression. Our model is
based on a combination of multiple GPs with random hyperparameters. The model is
trained by incrementally allocating new examples to a selected subset of GPs.
The selection is carried out efficiently by optimizing a submodular function.
Experiments on real-world data sets showed that our method outperforms existing
online GP regression methods in both accuracy and efficiency. The applicability
of the proposed method is demonstrated by the mouse-trajectory prediction in an
Internet banking scenario.Andere Autor:innen -
Auszeichnungen/Preise
-
Finalist of Chunhui Venture Competition
Chinese Ministry of Education
We build an AI-backed marketplace connecting private investors with their best matching quants and strategies. We are a party venue for quants' intellectual quest and exposition, as well as a rewarding feast to fulfill private investors' financial expectations. Techniques such as backtesting, recommendation, search and chatbot are heavily used.
-
Runner-up in Quantopian Open Contest #3
Quantopian (see https://www.quantopian.com/leaderboard/3)
Developed an online portfolio balancing algorithm that achieves 50% annual return in live paper trading. In April 2015, my algorithm was ranked at 2nd place out of more than 300 algorithms in paper trading.
Details can be found here:
https://www.quantopian.com/leaderboard/3 -
31337 & Audience Award
Zalando
Developed Zketch during Zalando's hack week, a search engine retrieves products to match with a hand-drawn sketch query.
31337: Awarded for the most geeky project in terms of nasty technical challenges solved, extreme difficulty, use of low-level programming language or extreme networking hacking etc.
Audience Award: After every presentation we do a "clapping session" and measure the noise level in dB. The team with the loudest applause wins. -
Chinese Government Award for Outstanding Ph.D. Students Abroad
Chinese Ministry of Education
The award recognizes the academic excellence of Chinese Ph.D. students studying oversea. It is granted across all fields of study. In 2013, 40 Chinese Ph.D. students in Germany were awarded.
http://www.in.tum.de/fuer-studierende/aktuelles/detail/newsarticle/chinesische-regierung-zeichnet-informatik-doktorand-aus.html -
Student travel award
Association for the Advancement of Artificial Intelligence (AAAI), 2013
This award recognizes the paper "Lazy Gaussian Process Committee for Real-Time Online Regression"
-
Best paper award
ACM SIGKDD Workshop on Knowledge Discovery, Modeling, and Simulation 2011
This award is recognizes the paper "Supervised Topic Transition Model for Detecting Malicious System Call Sequences".
The sponsor of this award is Science Applications International Corporation (SAIC). -
ImagineCup Runner-up
Microsoft China
Participated ImagineCup software design competition (China region) with Xu Pu. We used C# and .NET techniques and developed a sound-based interface for visual disabilities to gain better control of their PC.
The award is issued by Bill Gates himself at Peking University, 2007.
Organisationen
-
Association for the Advancement of Artificial Intelligence (AAAI)
Student member
–
Weitere Aktivitäten von Han Xiao
-
Can't we just use LLM for reranking? Just throw the 𝐪𝐮𝐞𝐫𝐲, 𝐝𝐨𝐜𝟏, 𝐝𝐨𝐜𝟐,...𝐝𝐨𝐜𝐍 into the context window and let the LLM figure out the…
Can't we just use LLM for reranking? Just throw the 𝐪𝐮𝐞𝐫𝐲, 𝐝𝐨𝐜𝟏, 𝐝𝐨𝐜𝟐,...𝐝𝐨𝐜𝐍 into the context window and let the LLM figure out the…
Geteilt von Han Xiao
-
Jina AI just released a multilingual reranker model for RAG and retrieval. It's quite efficient, and performs well for English and beyond. Sadly, it…
Jina AI just released a multilingual reranker model for RAG and retrieval. It's quite efficient, and performs well for English and beyond. Sadly, it…
Beliebt bei Han Xiao
-
Today, we are releasing 𝗝𝗶𝗻𝗮 𝗥𝗲𝗿𝗮𝗻𝗸𝗲𝗿 𝘃𝟮 (jina-reranker-v2-base-multilingual), our latest and the most powerful neural reranker model…
Today, we are releasing 𝗝𝗶𝗻𝗮 𝗥𝗲𝗿𝗮𝗻𝗸𝗲𝗿 𝘃𝟮 (jina-reranker-v2-base-multilingual), our latest and the most powerful neural reranker model…
Geteilt von Han Xiao
-
Thanks, Christian Haug for organizing this two-week event 👏 https://www.zuberlin.city/ It was fun engaging with the curious audience, especially…
Thanks, Christian Haug for organizing this two-week event 👏 https://www.zuberlin.city/ It was fun engaging with the curious audience, especially…
Beliebt bei Han Xiao
-
Can you find all three oopsies in the picture? I'm back in Berlin after the European AI Startup Matchday in Malmö 🇸🇪. It was a great opportunity to…
Can you find all three oopsies in the picture? I'm back in Berlin after the European AI Startup Matchday in Malmö 🇸🇪. It was a great opportunity to…
Beliebt bei Han Xiao
-
Jina AI Open Sources Jina CLIP: A State-of-the-Art English Multimodal (Text-Image) Embedding Model Article: https://lnkd.in/gxWGcgaE Jina AI…
Jina AI Open Sources Jina CLIP: A State-of-the-Art English Multimodal (Text-Image) Embedding Model Article: https://lnkd.in/gxWGcgaE Jina AI…
Beliebt bei Han Xiao
-
Building MuRAG (Multimodal RAG)? We make 𝐉𝐢𝐧𝐚 𝐂𝐋𝐈𝐏 open source and available on Hugging Face! You can now use it via 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬…
Building MuRAG (Multimodal RAG)? We make 𝐉𝐢𝐧𝐚 𝐂𝐋𝐈𝐏 open source and available on Hugging Face! You can now use it via 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬…
Geteilt von Han Xiao
Weitere ähnliche Profile
Weitere Mitglieder namens Han Xiao in Deutschland
Es gibt auf LinkedIn 4 weitere Personen namens Han Xiao, die sich in Deutschland befinden.
Weitere Mitglieder anzeigen, die Han Xiao heißen