Sachin Kumar

Raleigh-Durham-Chapel Hill Area Contact Info
7K followers 500+ connections

Join to view profile

About

Executive Summary:
5+ Years’ experience in software development and data science…

Articles by Sachin

Contributions

Activity

Join now to see all activity

Experience & Education

  • Chegg Inc.

View Sachin’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Volunteer Experience

  • Accreditation Assistant

    Commonwealth Games Delhi 2010

    - 3 months

    Social Services

    Managed accreditation records and issuance of accreditation cards to Defence officials,Transport authorities and Municipal or civil authorities involved in organising CWG Delhi 2010

  • Food Bank of Central & Eastern NC Graphic

    Volunteer

    Food Bank of Central & Eastern NC

    - Present 7 years

    Social Services

    Along with fellow volunteers worked to get approx 510 food boxes ready to be delivered to community which will in turn provide around 12000 meals.

Publications

  • Question Answering Systems for Legal Domain

    Symposium on Artificial Intelligence and Law 2021

    Question Answering has been a challenging problem to solve in the legal domain due to the complex nature of the legal text. To effectively solve the problem of question answering in legal domain, the answers presented need to be relevant to the question asked, but also present diverse answers so that it can guide the legal researchers with the ill-formed queries. While some approaches have been proposed and implemented to solve this problem like factoid-based question answering and BM25 based…

    Question Answering has been a challenging problem to solve in the legal domain due to the complex nature of the legal text. To effectively solve the problem of question answering in legal domain, the answers presented need to be relevant to the question asked, but also present diverse answers so that it can guide the legal researchers with the ill-formed queries. While some approaches have been proposed and implemented to solve this problem like factoid-based question answering and BM25 based retrieval systems, these approaches have suffered in the areas of either relevance or novelty or both.
    In this presentation, I will be presenting the approaches implemented at LexisNexis for answering the factoid questions for a limited number of legal question types like statute of limitations, doctrines, etc. Furthermore, I will be discussing a high-level overview of our Neural Information Retrieval based Open Legal domain Question Answering which expands on the coverage of various legal domain Question types which not only address factoid questions but also answers complex procedural questions.
    Symposium page : https://sites.google.com/view/sail-2021/invited-talks

    See publication
  • Language Model Compression and implications to Search

    Proceedings of the 4th Annual RELX Search Summit

    BERT like transformer language models are getting complex and large. Industry and research are training large general-purpose models that can do a multitude of natural language tasks such as mapping text in a semantic similarity space or doing query intent recognition etc. These models are then later fine-tuned to specific uses, which means the need to operationalize several versions of these models. As things grow and spiral out of proportion with respect to compute and cost, there is a direct…

    BERT like transformer language models are getting complex and large. Industry and research are training large general-purpose models that can do a multitude of natural language tasks such as mapping text in a semantic similarity space or doing query intent recognition etc. These models are then later fine-tuned to specific uses, which means the need to operationalize several versions of these models. As things grow and spiral out of proportion with respect to compute and cost, there is a direct impact on operationalization. As beneficiaries of this research and the models thus produced, there is a need for reliable model compression techniques to effectively use these models in downstream tasks. Current research proposes methods such as tree pruning, knowledge distillation, quantization, parameter size reduction that have different uses and downstream implications. We would like to propose an analytical framework for measuring the quality of model compression, especially with respect to application to downstream tasks and share the cost implications of some of these techniques as applied to Lexis Answers DCG.

    Other authors
    See publication
  • Understanding User Query Intent and Target Terms in Legal Domain

    Springer, LNCS, vol 11608, Proceedings of 24th International Conference on Applications of Natural Language to Information Systems,NLDB 2019

    Lexis Advance is a legal research service provided by LexisNexis that can respond to natural language queries. It includes a module called Lexis Answers which implements advanced Natural Language Processing (NLP) capabilities to improve understanding of the intent of the user’s queries. Lexis Answers can respond to natural language questions concerning legal question types such as statute of limitations, elements of a claim, definition of legal terms, and others. Herein, we report on the…

    Lexis Advance is a legal research service provided by LexisNexis that can respond to natural language queries. It includes a module called Lexis Answers which implements advanced Natural Language Processing (NLP) capabilities to improve understanding of the intent of the user’s queries. Lexis Answers can respond to natural language questions concerning legal question types such as statute of limitations, elements of a claim, definition of legal terms, and others. Herein, we report on the successful use of advanced NLP approaches for detecting not only named entities, but entire legal phrases, a skill previously requiring domain knowledge and human expertise. We have utilized the Conditional Random Fields (CRFs) approach that employs hand-engineered features combined with word2vec embeddings trained on legal corpus. Furthermore, to reduce our dependency on hand-engineered features, we have also implemented deep learning architecture comprising of bidirectional Long Short-Term Memory (BiLSTM) and linear chain CRF. Both approaches were benchmarked against a rule-based approach for different types of legal questions. We find that both CRF and BiLSTM-CRF can identify query intents and legal concepts with comparable precision but much higher recall and F-score than the baseline. The resulting models have been employed in Lexis Answers as critical improvement in our natural language query understanding.

    Other authors
    See publication
  • Optimising Inbound IVR Applications

    NATIONAL CONFERENCE ON Computational Intelligence, Communication Network, and Smart Grid Proceedings

    Presented this paper in NATIONAL CONFERENCE ON Computational Intelligence, Communication Network, and Smart Grid(NCCS-2016) and published in proceedings of this conference with ISBN: 978-93-85758-03-4
    Abstract : IVR(Interactive Voice Response) has emerged as one of the most prominent way to deliver proactive customer service .But as IVR has been used across many industry verticals catering to large customer base there has been also a surge in demand for more efficient IVR systems to cope up…

    Presented this paper in NATIONAL CONFERENCE ON Computational Intelligence, Communication Network, and Smart Grid(NCCS-2016) and published in proceedings of this conference with ISBN: 978-93-85758-03-4
    Abstract : IVR(Interactive Voice Response) has emerged as one of the most prominent way to deliver proactive customer service .But as IVR has been used across many industry verticals catering to large customer base there has been also a surge in demand for more efficient IVR systems to cope up with increasing portfolios and customers to serve. Now most of the studies focus on more of the functional aspects leaving aside the core technical aspects of IVR which can be tweaked in order to optimize IVR application to function at its full capacity. This paper discusses detailed technical and functional aspects with the methods and tweaks to be done at application level, functional level and ASR level in order to optimize IVR with DTMF(dual tone multi frequency) and speech recognition abilities to its full potential

    See publication

Patents

  • Transparent Iterative Multi-Concept Semantic Search

    Issued US-11694033-B2

    An Iterative multi-concept search methodology with continuous feedback by users engaging with legal concepts presented.

    Other inventors
    See patent
  • Systems and methods for providing answers to a query

    Filed US20210216576A1

    Systems and methods for open domain question-answering are disclosed. In one embodiment, a method of providing answers to a question includes retrieving, by a computing device, a plurality of passages relevant to a search query generating a plurality of question-passage pairs, wherein each question-passage pair includes the search query and an individual passage of the plurality of passages, and determining, using a computer model, a probability that a passage of each question-passage pair of…

    Systems and methods for open domain question-answering are disclosed. In one embodiment, a method of providing answers to a question includes retrieving, by a computing device, a plurality of passages relevant to a search query generating a plurality of question-passage pairs, wherein each question-passage pair includes the search query and an individual passage of the plurality of passages, and determining, using a computer model, a probability that a passage of each question-passage pair of at least some of the plurality of question-passage pairs is an answer to a question posed by the search query. The method also includes displaying, on an electronic display, a selected passage of a question-passage pair having a highest probability that the passage is the answer to the question posed by the search query.

    Other inventors
    See patent

Courses

  • Artificial Intelligence

    -

  • Automated Learning and Data Analysis

    -

  • Data Driven Decision Making

    -

  • Data Intensive Computing

    -

  • Database Management Systems

    -

  • Design and Analysis of Algorithms

    -

  • Foundations of Data Science

    -

  • Software Security

    -

  • Visual Interfaces for Mobile Devices

    -

Projects

  • Open Domain Question Answering on US Legal Text

    - Present

    -Worked on developing models and various fine-tuning approaches using BERT Language model trained on US Caselaw and secondary sources to implement Question Answering, which will allow users to get ask questions and get answers on even the complex legal procedural questions.
    - Developed approaches for BERT based model compression to save costs on embeddings usage in production.
    - Developed approaches for query reformulation to improve the quality and robustness of answers obtained for…

    -Worked on developing models and various fine-tuning approaches using BERT Language model trained on US Caselaw and secondary sources to implement Question Answering, which will allow users to get ask questions and get answers on even the complex legal procedural questions.
    - Developed approaches for BERT based model compression to save costs on embeddings usage in production.
    - Developed approaches for query reformulation to improve the quality and robustness of answers obtained for legal questions.
    - Worked on developing approaches to reduce inference time for large BERT model while also reducing model serving size

  • Intelligent-Chatbot-with-NLP-Capability

    Implemented chat bot as a part of Hackathon Competition which serves as online help desk for LexisNexis
    - This chat bot can successfully identify the intent of the user, retrieves relevant information from the knowledge base answering queries related to LexisNexis Online Help System
    - Implemented Natural Language Processing's Bag of Words model and Stanford's NLTK API for lemmatization and other categorization related tasks.
    - Used Python for Data Processing and Flask for Web interface

    Implemented chat bot as a part of Hackathon Competition which serves as online help desk for LexisNexis
    - This chat bot can successfully identify the intent of the user, retrieves relevant information from the knowledge base answering queries related to LexisNexis Online Help System
    - Implemented Natural Language Processing's Bag of Words model and Stanford's NLTK API for lemmatization and other categorization related tasks.
    - Used Python for Data Processing and Flask for Web interface

    See project
  • NER approaches for identifying Query Intent and Target Term

    -

    Worked on developing machine learning and deep learning models for NER(Named-Entity Recognition) domain to identify the search query intent and target terms for the definition intent queries to deliver Lexis Answers factoids into LexisNexis Advance Application. Specifically worked on :
    i) Developed Deep learning model of Bidirectional LSTM-CRF with F1-Score of 96.2 for identifying definition intent queries and associated target terms.
    ii) Also developed a CRF(Conditional Random Fields)…

    Worked on developing machine learning and deep learning models for NER(Named-Entity Recognition) domain to identify the search query intent and target terms for the definition intent queries to deliver Lexis Answers factoids into LexisNexis Advance Application. Specifically worked on :
    i) Developed Deep learning model of Bidirectional LSTM-CRF with F1-Score of 96.2 for identifying definition intent queries and associated target terms.
    ii) Also developed a CRF(Conditional Random Fields) algorithm model with F1-Score of 95.5 for identifying definition intent queries and associated target terms

    Specifically worked on:
    - Annotating the dataset and identifying positive and negative queries for identifying intent and target terms.
    - Writing the logic for doing IOB tagging of user queries
    - Creating new features using Word2vec model and past and next words window coupled with word stems and lemma
    - Implementing CRF (Conditional Random Fields) model along with regularization parameter and hyperparameter tuning to select best model for predictions
    - Implementing deep learning model of Bidirectional LSTM-CRF coupled with 200 dimensional word embeddings and character embeddings.

  • Finding Sister Cities with Natural Language Processing

    -

    A Natural language processing based approach to find sister cities or cities are similar in user- specified criteria by using a combination of categories based weighted approach and newspapers articles based relevancy of cities to user's criteria.
    Activities done as part of the project:
    i) Gathering of data for 16000 US cities, from various structured and unstructured sources involving web scraping for certain unstructured sources.
    ii) Training of word2vec model using newspaper…

    A Natural language processing based approach to find sister cities or cities are similar in user- specified criteria by using a combination of categories based weighted approach and newspapers articles based relevancy of cities to user's criteria.
    Activities done as part of the project:
    i) Gathering of data for 16000 US cities, from various structured and unstructured sources involving web scraping for certain unstructured sources.
    ii) Training of word2vec model using newspaper articles corpus to generate word2vec embeddings.
    iii) Visualise embeddings using t-SNE algorithm.
    iv) Scraping google news for url's , for scraping the data from respective url's corresponding to the searches made in real time of search terms for cities to measure up relevance of respective cities.
    v) Implementing dual embeddings space model to rank the newspaper articles relevance for measuring up search query relevance to respective cities.

    See project
  • ForeCasting US Monthly Retail Trade and Food Services Sales Revenue

    -

    Developed ARIMA model for time-series forecasting of monthly trade revenue projections using time series data of United States Monthly retail trade and food services data provided by United States Census Bureau.
    Activities done are:-
    i) Cleaning Data - Removed missing revenue entries and aggregated different retail categories revenue for a particular data to make cumulative projections on particular data
    ii) Data exploration : Explored data to check stationarity and make data…

    Developed ARIMA model for time-series forecasting of monthly trade revenue projections using time series data of United States Monthly retail trade and food services data provided by United States Census Bureau.
    Activities done are:-
    i) Cleaning Data - Removed missing revenue entries and aggregated different retail categories revenue for a particular data to make cumulative projections on particular data
    ii) Data exploration : Explored data to check stationarity and make data stationary by differencing and decomposition.
    iii) Data Modeling : - Explored data to find optimal order parameters for autocorrelation and then used that value to build ARIMA model and compare actual and forecasted projections.

    See project
  • Churn Prediction on KKBox Music Dataset using Random Forest Classifier

    -

    In this project,worked on predicting whether a user will churn after their subscription expires, basically doing following activities:
    1) Exploratory Data Analysis to explore the current trends and eliminate highly correlated features.
    2)Feature Engineering to engineer new features highly correlated with target variable.
    3) Modelled the explanatory variables using Random Forest Classifier, with test accuracy of 94.2% and logloss of 0.10

    The goal of this project was to predict…

    In this project,worked on predicting whether a user will churn after their subscription expires, basically doing following activities:
    1) Exploratory Data Analysis to explore the current trends and eliminate highly correlated features.
    2)Feature Engineering to engineer new features highly correlated with target variable.
    3) Modelled the explanatory variables using Random Forest Classifier, with test accuracy of 94.2% and logloss of 0.10

    The goal of this project was to predict the customer churn for which got the best results with Random Forest Classifier with logloss of 0.10.

    See project
  • Search Algorithms Implementation for cities on US and Romania Map

    -

    Implemented Java based implementations of Search algorithms as follows:
    i) Informed Search:
    Developed code to search map of US with Astar,Greedy Search and Uniform Cost Search implementations.
    ii) Uninformed Search:
    Developed code to search map of Romania with DFS and BFS implementations.

    See project
  • Facebook-Movies-Data-Miner-and-Analyzer

    -

    Mined Facebook's Official movies pages data using Facebook's Graph API and then analyse it to generate following insights:

    i) Generated plots for top ten movies with total likes received,comparative to each other

    ii) Plotted comparative reactions to each movie last 6 facebook posts in terms of likes,shares and comments on last 6 posts.

    iii) Plotting engagement of each movie facebook fanbase to last 6 posts in terms of relative likes,rellative shares and relative comments to…

    Mined Facebook's Official movies pages data using Facebook's Graph API and then analyse it to generate following insights:

    i) Generated plots for top ten movies with total likes received,comparative to each other

    ii) Plotted comparative reactions to each movie last 6 facebook posts in terms of likes,shares and comments on last 6 posts.

    iii) Plotting engagement of each movie facebook fanbase to last 6 posts in terms of relative likes,rellative shares and relative comments to last 6 posts.

    iv)Finding average engagement of top 10 movies in terms of average comments,likes and shares per fan.

    See project
  • Development and Analytics of Walgreens Retail Customer Experience IVR

    -


    Analytics Component:
    - Developed and implemented improved algorithm in Walgreens Retail IVR to predict inbound callers calling intent and accordingly offer them a more personalized user experience.
    - Performed analytics by processing speech recognition logs to tune IVR voice recognition parameters for Walgreens IVR’s.

    Core Development:
    -Developed a IVR application with self-service menus for Walgreens customers, supporting both speech and DTMF recognitions.
    - Developer…


    Analytics Component:
    - Developed and implemented improved algorithm in Walgreens Retail IVR to predict inbound callers calling intent and accordingly offer them a more personalized user experience.
    - Performed analytics by processing speech recognition logs to tune IVR voice recognition parameters for Walgreens IVR’s.

    Core Development:
    -Developed a IVR application with self-service menus for Walgreens customers, supporting both speech and DTMF recognitions.
    - Developer of IVR and webservices solution to manage Consent and Preference for a customer by providing preference options on IVR channels.Specifically include implementation of new triggering mechanism to abide by customer directives and enrich notifications to address experience gaps and lower our cost to fill.
    Other related Projects:

    Walgreens Balance Rewards Refresh - Loyalty Program IVR

    Project description
    Developed Walgreens Balance Rewards Refresh IVR which is a Interactive Voice Response application that serves walgreens customers enrolled as loyalty member,and helps them to add their rewards points if any transactions left unclaimed by them,to know their last 5 transactions,to know their points balance,and if they are not enrolled in Balance rewards program then it will provides them with the overview and option to enroll in loyalty program through a directed dialog speech user interface and DTMF.
    Also Integrated Interactions Watson ASR into existing RCE IVR performing a POC for comparative analysis of recognition accuracy provided by Watson ASR as compared to Nuance ASR currently being used.

  • Walgreens Contact Center Applications

    -

    Developed new functionalities in various Walgreens Contact Center Applications:

    i) CMITS -
    Developed new functionality to support outbound campaigns via CMITS(Cusomer Management and Issue Tracking System) regulated by timezones and quota allocation which is Customer Management Integrated Telephony System,meant for use of call center executives.
    Also performed data wrangling and data transformation for loading data for Walgreens Outbound Campaigns performed by contact center…

    Developed new functionalities in various Walgreens Contact Center Applications:

    i) CMITS -
    Developed new functionality to support outbound campaigns via CMITS(Cusomer Management and Issue Tracking System) regulated by timezones and quota allocation which is Customer Management Integrated Telephony System,meant for use of call center executives.
    Also performed data wrangling and data transformation for loading data for Walgreens Outbound Campaigns performed by contact center agents using CMITS.

    ii) CCR
    Developed new functionalities to support visualization of more detailed contact center data in form of reports generated via a UI provided with specified input selection parameters.

  • Walgreens Intelligent Call Routing using NLP

    -

    This project aims at implementing a natural language solution that allows callers to self-identify their need rather than navigating a series of pre-configured menu options that may or may not deliver a solution that best fits their needs. Also, in scope is integration with a Human Assisted Understanding Platform.
    Updated caller experience where callers are allowed to answer an open ended question and hence improve customer experience and containment

  • Automated Barcode Server Restoration Batch

    -

    Developed a batch to automate restoration of barcode server port failures for Walgreens Faxing application automating support activity for application which was earlier done manually, resulting in savings of estimated $ 20000 per year in person effort.

  • Save Water Save Earth - a Azure based Cloud Application

    -

    Developed frontend using HTML, CSS, and Google Maps API leveraging geo-spatial information for Microsoft –Azure cloud based application “Save Water Save Earth” which was a cloud application focused on community driven initiative towards saving water with mass involvement in it by accessing the water savings by comparing their water consumption stats and rewarding the top water savers for their noble efforts to make this cause achieve success.

    Other creators
    • Sachin Tiwari
  • National Rural Employment (Project Selected for IBM TGMC)

    -

    This is a J2EE Based web based application with three users consumer, administrator and managers with the roles defined as consumer can view their employment and earning history under the national rural employment scheme of government of India and view other employment opportunities in their area and admin can maintain consumers records, aggregate their feedback and print reports, managers have all functionalities that admin have with added functionalities of answering users queries, adding…

    This is a J2EE Based web based application with three users consumer, administrator and managers with the roles defined as consumer can view their employment and earning history under the national rural employment scheme of government of India and view other employment opportunities in their area and admin can maintain consumers records, aggregate their feedback and print reports, managers have all functionalities that admin have with added functionalities of answering users queries, adding new admins.

Honors & Awards

  • 2023 Chegg Global Hackathon

    Chegg

    Winner of Chegg's Global Hackathon, with idea and implementation of an AI based Exam Prep solution which incorporated elements of cognitive learning theory, Generative AI and personalization.
    As part of this hackathon competed along with 450 colleagues across 80 teams spanning various Chegg offices at US,Canada,Europe, Israel and India

  • 2020 LexisNexis Inclusion and diversity hackathon

    -

    Won 1st place at 2020 LexisNexis Inclusion and diversity hackathon.As part of our winning solution we built an API with integrations in Outlook,Confluence and Teams to identify non-inclusive terms and suggest possible inclusive terms replacements.Another great outcome of the hackathon was our 1st prize of $5000 will be going to our team's chosen charity of blackgirlscode.

  • 2019 LexisNexis Global Hackathon

    LexisNexis

    Won 2nd Place at LexisNexis Global Hackathon where presented our hackathon solution front of the LexisNexis Executive team including our CEO Michael Walsh, where we did compete against teams from other geographies. Our solution was for improving legal queries search relevance by predicting query intent using an ensemble of deep learning and machine learning models.

  • 2019 LexisNexis Query Intent Hackathon

    LexisNexis

    winner of Query Intent Hackathon organized at LexisNexis,where our deep learning and machine learning models for classification and further sub-classification of legal search queries which not only gave the best f1 scores but was also production ready with lightweight models being less compute intensive and fast on prediction time.

  • 2019 LexisNexis Cybersecurity Hackathon

    LexisNexis

    Won 3rd place at 2019 LexisNexis Cybersecurity Hackathon organised on the theme of Capture The Flag event, with objective of capturing the flag key by accomplishing various security challenges which involved nuances of cryptography and web application security.

  • 2018 LexisNexis Rise to Code Hackathon Winner

    LexisNexis

    Won 2018 LexisNexis Rise to Code Hackathon with my team for development of a working prototype of semantic search using NLP,Machine learning and ElasticSearch.

  • Best Team Award

    TCS

    Awarded for exceptional contribution in delivering critical IVR's to customer in a very tight schedule, thus benefiting the customer in reducing their call volume by 11% which translated into huge savings for our customer.

  • On The Spot Award

    TCS

    Awarded On the spot award for contribution in conceptualizing, desigining and building a windows script which has reduced the tickets counts and person hours spent by almost 30% per month thus giving significant value addition to the project which is verbally appreciated by client as well.

Languages

  • English

    Native or bilingual proficiency

  • Hindi

    Native or bilingual proficiency

Recommendations received

More activity by Sachin

View Sachin’s full profile

  • See who you know in common
  • Get introduced
  • Contact Sachin directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Sachin Kumar in United States

Add new skills with these courses