Statistical Modeling

Statistical Tools We Use for Data Analysis and Statistical Modeling
  • Open Source R: a free software environment for statistical computing and graphics.
  • SciLab: a free and open source software for engineers & scientists, with a long history (first release in 1994) and a growing community (100 000 downloads every months worldwide).
  • SAS: the leader in business analytics software and services, and the largest independent vendor in the business intelligence market.
  • STATA: statistical software for data science.

Machine Learning & Deep Learning

Libraries/Frameworks We Use
  • Scikit-Learn: Machine Learning in Python. a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Simple and efficient tools for predictive data analysis.
  • PyTorch: An open source machine learning framework that accelerates the path from research prototyping to production deployment.
  • TensorFlow: an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
  • XGBoost: an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.

CDS Development

We Use Medical Terminologies/Standards to Develop FHIR compliant CDS tools
  • FHIR Standards: Fast Healthcare Interoperability Resources (FHIR, pronounced "Fire") defines a set of "Resources" that represent granular clinical concepts.
  • SNOMED CT: the most comprehensive, multilingual clinical healthcare terminology in the world.
  • RxNorm : RxNorm provides normalized names for clinical drugs, and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, and Gold Standard Drug Database.
  • LOINC: The international standard for identifying health measurements, observations, and documents.
  • NDF-RT/MED-RT: MED-RT is a replacement and successor to NDF-RT. Both terminologies are formal ontological representations of medication terminology, pharmacologic classifications, and asserted authoritative relationships between them. MED-RT includes native pharmacologic classification concepts (e.g. mechanisms of action (MoA), physiologic effects (PE), Established Pharmacologic Class (EPC)) and all relationships asserted between concepts in any namespace.
  • NDC: Drug products are identified and reported using a unique, three-segment number, called the National Drug Code (NDC), which serves as a universal product identifier for drugs.
  • CVX: The CDC's National Center of Immunization and Respiratory Diseases (NCIRD) developed and maintains the CVX (vaccine administered) code set. It includes both active and inactive vaccines available in the US. CVX codes for inactive vaccines allow transmission of historical immunization records.
  • ICD: ICD is the foundation for the identification of health trends and statistics globally, and the international standard for reporting diseases and health conditions. It is the diagnostic classification standard for all clinical and research purposes. ICD defines the universe of diseases, disorders, injuries and other related health conditions, listed in a comprehensive, hierarchical fashion

Natural Language Processing

NLP Tools We Use
  • Stanford CoreNLP: The Natural Language Processing Group at Stanford University is a team of faculty, postdocs, programmers and students who work together on algorithms that allow computers to process and understand human languages.
  • Python NLTK: LTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
  • cTAKEs: a natural language processing system for extraction of information from electronic medical record clinical free-text.
  • Python spaCy: Industrial-Strength Natural Language Processing in Python.
  • PyTorch: An open source machine learning framework that accelerates the path from research prototyping to production deployment.
  • TensorFlow: an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Professional Training

Data Analysis, Research Design, and Programming
  • Data Analysis: R, STATA, SAS, and SciLab.
  • Programming Languages: Python, JAVA, and SQL.
  • Research Design

"It is health that is real wealth, and not pieces of gold and silver. "

- Mahatma Gandhi

Some of Featured Works

Awesome Research Projects that are worth boasting of.