up:: [[DBA806 - Applied AI Innovation]]
tags:: #source/course #on/AI
people:: [[Dakshinamurthy Kolluru]]
# DBA806 M4 - AI in Model Development
Dr. Dakshinamurthy V. Kolluru, the Provost of UGDX Institutes, presented on Applied AI Innovation, covering various aspects of AI adoption and advancements in processing unstructured data. The agenda included motivation, overcoming challenges in processing unstructured data, the paradigm shift in building modern AI applications, and use cases of innovation.
**Speaker Background:**
- Dr. Dakshinamurthy V Kolluru, Provost, UGDX Institutes.
- Holds Ph.D. & M.S. in Material Science & Engineering from Carnegie Mellon Univ., USA.
- B.E. in Metallurgical Eng., NIT Trichy, India.
**Word Embeddings:**
- Challenges in feeding words into computers.
- Introduction to [[One-Hot Encoding]] and its limitations.
- i.e., All words are treated equidistant.
- [[Term Document Matrix]] and its sparsity issues.
- Lack of meaningful representation when words are absent in documents.
**Artificial Neural Networks (ANN):**
- Overview of [[Artificial Neural Networks (ANN)]] and its role in feature engineering.
- Demonstration of ANN in text using Word Vectors.
- Introduction to Google Word2Vec and modern-day Word Vectors.
**Encoding Various Data Types:**
- Images described through RGB channels and tensors.
- Audio treated as images and vectorized.
- Categorical attributes encoding with decoders explained.
**Building Models with Unstructured Data:**
- Introduction to tools like "attention tools" and [[Mode-Change-Tool (MCT)]].
- Evolution in handling categorical and unstructured data.
- Paradigm shift in building models from simple data to complex data.
**Modern-day Building Blocks:**
- Overview of *Encoders, Decoders, MCTs, Attention Models, and ML Algorithms*.
- Transformation of various data types into vectors.
#### Key Discussions from Lecture
The instructor provides motivation for the session, discussing the evolution of AI and its current impact. He mentions the increasing data generation, the role of IoT, GPU technology, and data transmission speed.
**Key Points:**
- AI's transformation over the years.
- Data generation shift from humans to machines.
- IoT, GPU technology, and improved data transmission speed.
The instructor introduces the main topic: the role of *unstructured data in AI innovation*. He explains the significance of processing unstructured data, mentioning text, audio, and video, and announces his focus on a specific AI innovation related to the processing of unstructured data.
Professor discusses the key technologies driving AI innovation, including IoT, GPU technology, and advancements in data transmission. He emphasizes the *exponential growth in data generation*.
**Key Technological Drivers of AI Innovation:**
1. [[Internet of Things (IoT)]]
2. [[Graphics Processing Unit (GPU)]] Technology
3. Advancements in Data Transmission Speed
The speaker discusses the *challenge of feeding emotional and contextual significance of words into a computer*. They highlight the evolution of word representation methods over a 40 to 50-year period, starting with assigning numbers to words based on a dictionary order. The limitations of this approach are noted, such as the lack of context and emotional meaning.
The speaker then introduces the concept of [[One-Hot Encoding]] representation, where each word is represented by a vector with a one in its corresponding position and zeros elsewhere. This method is *criticized for lacking meaningful distances between words and treating all words as equidistant*.
Next, the [[Term Document Matrix]] is introduced, a powerful representation method from the 1990s. It involves creating a matrix with documents as columns, unique words as rows, and representing word presence or counts. The sparsity of this matrix is discussed, and the speaker expresses concern about the *lack of meaningful representation when words are absent in documents*.
The limitations of sparsity are further explained, emphasizing the difficulty in understanding dissimilarity or context when words are not present in documents. Despite its popularity, the term document matrix is criticized for not working as well as desired.
The speaker concludes by leading a word association game to demonstrate how our brains associate words with concepts and suggesting that understanding these associations may lead to better word representation methods.
**Key Points:**
* Assignment of numbers to words based on dictionary order lacks emotional and contextual meaning.
* One-Hot Encoding representation treats all words as equidistant, making it insufficient for conveying meaning.
* The Term Document Matrix represents words based on their presence or counts in documents, but its sparsity poses challenges.
* The sparsity issue limits the ability to understand dissimilarity or context when words are absent in documents.
* Despite its popularity, the term document matrix does not work as effectively as desired for word representation.
* The speaker suggests that understanding how our brains associate words with concepts may lead to improved word representation methods.
The text discusses the concept of how our brains organize and represent words and concepts, particularly focusing on how certain words are associated with each other in our minds. It suggests that *our brains efficiently store words based on concepts* such as gender, wealth, and war. The speaker highlights that while words like "king" and "queen" may overlap in concept representation in our brains, other words like "Uranus" may not share similar associations. The text emphasizes the *challenge of practically creating a concept space* and discusses the historical struggle in the field of AI and machine learning to solve this problem.
The text also underscores the importance of [[Artificial Neural Networks (ANN)]] in addressing the challenges of feature engineering and concept representation, particularly in unstructured data like text and images, and highlights their role as powerful tools in modern machine learning and AI research.
Once upon a time, there was a king. The speaker discusses the process of word representation using neural networks, particularly focusing on the [[Word2Vec]] model developed by Google. The speaker explains two approaches to predicting the next word: one based on the previous words and the other using contextual words around the target word. The Word2Vec model aims to represent words as vectors, creating a powerful feature representation.
**Key points:**
- Words are organized in our brains based on concepts like gender, wealth, and war.
- The concepts associated with words like "king" and "queen" may overlap, while others like "Uranus" may not.
- Our brains efficiently store words and allow us to recognize similarities and dissimilarities.
- There has been a historical struggle in AI and machine learning to create a practical concept space.
- The text introduces [[Artificial Neural Networks (ANN)]] as a solution to the problem of [[Feature Engineering]], particularly in structured data.
- Feature engineering involves creating useful features from existing data.
- Humans have historically struggled to generate features for unstructured data like text and images.
- ANNs use nodes and connections to process information and learn from data.
- ANNs can *automatically create features from input data*, enabling accurate predictions using simple models. Serving as auto feature engineers, creating meaningful representations from input data.
**On Word2Vec:**
- Two approaches to word prediction: based on previous words or contextual words around the target word.
- Word2Vec model represents words as vectors for better prediction.
- Google utilized trillions of word pairs from books and articles to create a neural net with 300 nodes, achieving a powerful feature representation.
- The model showcased interesting analogies, such as "king - man + woman = queen," but also revealed biases, like associating "doctor - father + mother = nurse."
The speaker delves into the application of Word2Vec in [[Natural Language Processing (NLP)]] and the creation of word vectors. Google's approach involved using a thousand-dimensional random representation for each word, which was then reduced to 300 nodes for satisfactory results. The speaker highlights the marketing success of Google's Word2Vec, *emphasizing its ability to capture relationships between words and concepts*.
**Key Points:**
- Google's Word2Vec utilized a random thousand-dimensional representation for each word.
- A neural net with 300 nodes was created, becoming a powerful feature representation called word to vector (word2vec).
- Google's extensive use of diverse text sources resulted in a robust neural network, demonstrating its potential in natural language processing.
The speaker transitions to discussing the representation of words in modern language models, where *tokens are used instead of words*. [[Tokens (NLP)]], created by *dividing words into prefixes, suffixes, and root words, reduce the vocabulary size and handle new words effectively*. Current language models like GPT-3 use tokens represented in 1,024 dimensions, providing a more efficient and adaptive approach.
**Key Points:**
- Modern language models use tokens to represent words, divided into prefixes, suffixes, and root words.
- Tokens offer advantages such as a smaller vocabulary size and effective handling of new words.
- Current models like GPT-3 use 1,024-dimensional vectors to represent each token.
In the final part, the speaker touches on the application of neural networks in [[Image Processing]]. They explain that while humans perceive images naturally, *computers see them as three-dimensional tensors with channels for red, green, and blue*. Neural networks process these images, extracting features at different levels, ultimately achieving high-level representations that distinguish objects like dogs and cats.
**Key Points:**
- Image processing involves representing images as three-dimensional tensors with red, green, and blue channels.
- Neural networks process images, extracting features at various levels to distinguish objects.
- The speaker emphasizes the *ability of neural networks to autonomously learn powerful features from images*, leading to a revolution in image understanding.
**Vectorization**
The speaker discusses the concept of [[Vectorization]] in machine learning, where various data types such as *text, image, audio, and categorical attributes are transformed into vectors for easy processing by neural networks*. The speaker emphasizes the importance of encoding and decoding processes in this transformation. They provide examples, such as a matrimonial site using image features for matchmaking and a retail store predicting sales using categorical attribute vectors.
**Key Points:**
1. **Vectorization in Machine Learning:**
- Different data types like text, image, audio, and categorical attributes are transformed into vectors for neural network processing.
- No specific rule for vector size; trial and error often determines the suitable dimension.
2. **Encoding and Decoding:**
- Encoding involves converting data into vectors (encoders).
- Decoding reverses the process, reconstructing the original data from vectors (decoders).
3. **Example: Matrimonial Site**
- Describes an incident where a matrimonial site used image features to match brides and grooms.
- The speaker attempted to mimic the process by extracting features from celebrity images.
4. **Innovation in Categorical Attributes:**
- [[One-Hot Encoding]] used to represent categorical attributes numerically.
- Example of a retail store predicting sales using vectors derived from categorical attributes.
5. **Use of Attention Tool:**
- Attention tool is a neural net used to combine information from various data types into a single vector.
- It helps in consolidating text, image, and audio data for richer information.
6. **Mode Change Tools:**
- [[Mode-Change-Tool (MCT)]] transform one type of data into another, like converting an image into an audio-like vector.
- Mention of media or mode change tools for specific applications.
7. **Significance of Innovations:**
- The ability to represent diverse data types as vectors facilitates easier model development.
- Applications range from face recognition to predicting sales and diagnosing medical conditions.
8. **Decoders in Neural Networks:**
- Decoders are used to reconstruct original data from encoded vectors.
- Example of predicting sales using a two-layer neural network with one-hot encoding.
9. **Attention Tool Application:**
- Attention tool helps in combining information from multiple sources for more accurate predictions.
- Notable example of a competition where attention was used to predict outlet sales in a retail store.
The speaker discusses the ability to convert various forms of data, such as text, images, and vectors, using neural networks. The process involves using a *text encoder to convert a text description into a vector*, then using a neural network to *generate an image based on that vector*. This image, in turn, *produces the same vector as the original text description when fed into a decoder*, completing the cycle.
The presentation emphasizes that this process is not complex and involves moving neural networks efficiently. The shift in the approach to machine learning is highlighted, *noting that handling unstructured data, like text descriptions, has become more manageable than structured data*.
The speaker emphasizes the significant transformation in building models, *indicating a 180-degree change in the way machine learners approach problems*. In the past, complex models were applied to simple data, while now, with the ability to handle more complex data elegantly, simpler models suffice.
In the latter part of the presentation, questions from the audience are addressed, covering topics like word vectors, transformer models, and the process of drawing bounding areas in video identification. The speaker concludes by introducing the concept of *building powerful AI applications using modular building blocks, such as encoders, decoders, and trainers*.
**Key Points:**
- Ability to convert text, images, and vectors using neural networks.
- Not a complex tool, involves moving neural networks efficiently.
- Shift in machine learning approach, handling unstructured data better than structured.
- 180-degree change in the way machine learners approach problems.
- Transformation in building models, simpler models now suffice for complex data.
- Encoders, decoders, and trainers are essential components in building AI applications.
The speaker discusses the innovation in AI and the building block mindset, emphasizing the *simplicity of creating applications once certain building blocks are in place*. Two practical applications are highlighted, one involving *fraud detection using mouse movement patterns as a unique identifier*, and the other addressing *sentiment attribution by treating it as a language translation problem*.
**Key Points:**
1. **Innovation in AI:**
- Building block mindset simplifies the creation of applications.
- Demonstrates the ease of building applications with the availability of specific building blocks.
2. **Fraud Detection Application:**
- Splunk, a data management company, uses mouse movement patterns for fraud detection.
- Instead of traditional methods, they capture and convert mouse movements into an image vector.
- An image encoder classifier model determines the authenticity of the user based on the mouse movement vector.
- Achieved *state-of-the-art accuracy in fraud detection with a simple eight-line code*.
3. **Sentiment Attribution Application:**
- Sentiment attribution problem involves identifying sentiments for different attributes in a statement.
- Traditional machine learning methods were struggling with low accuracy and required extensive manual data labeling.
- The speaker suggests treating it as a language translation problem.
- Hired interns to generate translated text pairs for different attributes and trained an off-the-shelf language translation model.
- Achieved a significant *improvement in accuracy from 30% to 78% overnight*.
4. **ChatGPT and Data Usage:**
- Acknowledges the power of ChatGPT but notes the reluctance to put sensitive data into it.
- Suggests the possibility of training ChatGPT from ground up with a smaller dataset (10,000 to 20,000 samples).