DBA806 M3 - A Computer Scientist’s Guide to AI - FW.VISION

up:: [[DBA806 - Applied AI Innovation]] tags:: #source/course #on/AI people:: [[Praphul Chandra]] # DBA806 M3 - A Computer Scientist’s Guide to AI #### Key Discussions in Lecture [Session 3 - A Computer Scientist’s Guide to AI Capabilities - Dr Praphul Chandra - YouTube](https://www.youtube.com/watch?v=b9sZD4efm94) The speaker emphasized the *differentiation between data science and AI*, stating that *data science applies machine learning algorithms to structured data*, while *AI typically works with unstructured data* like text, image, audio, and video. The session provided examples of real-world applications of structured and unstructured data analysis, particularly focusing on text data. The text discusses various applications of computer vision and audio analytics in real-world scenarios. **Computer Vision Applications:** 1. **Store Optimization:** Tracking customer movement in stores to optimize product placement. - Utilizes tracking of customer movement for optimal product placement. 2. **Disaster Response:** Using drones to survey disaster-affected areas for survivors, damages, and managing search and rescue operations. - Drone use for disaster response to identify survivors and assess damages. 3. **Environment Monitoring:** Installing cameras in forests to monitor wildlife movement, track deforestation, and monitor pollution in cities. - Camera installations for wildlife monitoring, deforestation tracking, and pollution source monitoring. 4. **Content Retrieval - Image Search:** Utilizing object detection algorithms for image search, as seen in Google image search. - Object detection in image search using algorithms. 5. **Anomaly Detection:** Identifying unusual or unexpected objects or events in images or videos. - Algorithmic anomaly detection in various scenarios. 6. **Face Recognition:** Recognizing faces in images or videos for applications like access control, security, and social media tagging. - Face recognition for access control, security, and social media tagging. 7. **Video Summarization:** Creating shorter, information-rich videos from longer ones for applications in surveillance, security, and social media. - Algorithmic video summarization for surveillance, security, and social media. **Audio Analytics Applications:** 1. **Speech Recognition:** Converting spoken language (audio waveform) into text, used in voice assistants, medical/legal transcription, subtitles, and more. - Converting spoken language to text in applications such as voice assistants and transcription. 2. **Speaker Recognition:** Identifying and verifying individuals based on their voice, applied in phone banking, voice assistants, law enforcement, and home security. - Verifying individuals based on their voice for applications in phone banking, law enforcement, and home security. 3. **Sound Recognition:** Identifying non-speech sounds, including music, for various applications such as UX customization, vehicle control, and home security. - Recognition of non-speech sounds, like music, for applications in UX customization and vehicle control. **Key Points:** - Computer vision involves tracking customer movement, disaster response with drones, environment monitoring, [[Image Search]], anomaly detection, facial recognition, and video summarization. - [[Anomaly Detection]] identifies unusual events or objects in images or videos. - [[Facial Recognition]] is used for access control, security, and social media tagging. - [[Video Summarization]] creates shorter, informative videos for surveillance and social media. - Audio analytics includes [[Speech Recognition]], [[Speaker Recognition]], and [[Sound Recognition]]. - Speech recognition is used in voice assistants, transcription, subtitles, and more. - Speaker recognition verifies individuals based on their voice for applications like phone banking and law enforcement. - Sound recognition identifies non-speech sounds, like music, for UX customization and vehicle control. **Voice Deep Fakes and Speaker Recognition Challenges** - Deep fake technology can use less than a minute of someone's voice sample to create realistic imitations. - This poses a challenge for speaker recognition systems. - Generative AI advancements in creating realistic imitations (images, videos, text, and audio) contribute to a *cat-and-mouse game with detection and recognition methods*. - The *ongoing dynamic is likened to the field of cybersecurity*, with continuous improvement in both generative AI and detection techniques. **Key Points: Sound Analysis and Applications** - Sound analysis involves identifying and classifying sounds from audio recordings. - *Biophonic, geonic, and anthropic sounds are common categories, including animal sounds, environmental factors, and human-created sounds.* - Applications include detecting machine malfunctions in factories, [[Healthcare Monitoring]], home security, vehicle health monitoring, disaster response, [[Wildlife Monitoring]], and security applications like tracking drones. - [[Geonic Pattern Analysis]] may potentially help in early earthquake detection, but it requires further exploration. In this class, the instructor emphasizes the foundational view of the world based on the relational data model. The goal is to understand that despite various data formats, they all ultimately converge to this model when brought to the application level. The recent discussion on different data formats was aimed at highlighting the diversity in storage, processing, and engineering aspects, but they all lead to a common data model. The class introduces terminology such as attributes, features, columns, and dimensions. The vertical elements in the data model are referred to as attributes or features, while the horizontal ones are called rows. The discussion also touches on Big Data, where a high value of 'n' signifies a tall and thin matrix, and high-dimensional data, where a high value of 'p' indicates a wide dataset. The shift then moves towards the computational view of the world, with a focus on machine learning, AI, deep learning, and data science. The instructor introduces the concept of unsupervised learning, describing it as the task of inferring a function to uncover hidden patterns in data. Clustering is discussed as a subset of unsupervised learning, where the goal is to group similar rows together in the data. The representation of data as a scatter plot is introduced, where each column becomes a dimension. Clustering algorithms help identify groups of similar rows by assigning them the same color or label. The class concludes by posing thought experiments on applying clustering to different types of data, such as text documents, images, and audio signals. Anomaly detection, another unsupervised learning problem, is briefly mentioned as a method to identify unusual or anomalous elements in the data. **Key Points:** - The foundational view of the world is based on the relational data model. - Different data formats lead to the same relational data model at the application level. - Terminology includes attributes, features, columns, dimensions, rows, and Big Data vs. high-dimensional data. - The shift is made towards the computational view, focusing on machine learning, AI, deep learning, and data science. - Unsupervised learning involves inferring a function to discover hidden patterns in data. - Clustering, a subset of unsupervised learning, groups similar rows together based on the data's scatter plot representation. - Scatter plots use columns as dimensions and rows as data points. - Thought experiments are proposed for applying clustering to different data types: text, images, and audio signals. - Anomaly detection is briefly mentioned as another unsupervised learning problem, identifying unusual elements in the data. **Summary:** The text discusses the application of clustering algorithms in unsupervised learning, specifically focusing on three areas: clustering, anomaly detection, and recommender systems. **Key Points:** 1. **Clustering for Anomaly Detection:** - Use clustering algorithms to identify clusters within data. - Detect anomalies as data points that do not fit into any cluster. - Applications include finding unique text documents, images, or audio signals in a dataset. 2. **Recommender Systems:** - Recommender systems involve completing a sparse matrix where rows represent users, columns represent products, and cells contain user ratings. - Predict missing values to recommend products to users or recommend users for products. - Different approaches include user-based collaborative filtering, item-based collaborative filtering, and non-negative matrix factorization. - Applications include personalized content recommendations for e-commerce platforms or social media feeds. 3. **Thought Experiments:** - Consider scenarios where rows are text documents, images, or audio signals, and clustering is used to find unique patterns. - Examples include clustering news articles, product reviews, tweets, legal documents, or customer support tickets. 4. **Applications of Clustering in Different Domains:** - Text Documents: Cluster news articles, product reviews, tweets, legal documents, or customer support tickets. - Images: Cluster images for image search, segment parts of images for object detection, or analyze customer behavior in retail store videos. - Audio: Use clustering for speaker diarization, separating speakers in an audio recording, like in movies. 5. **Relevance to Real-World Problem Solving:** - The focus is on understanding the types of problems that can be addressed using machine learning (ML) and artificial intelligence (AI) algorithms. - Emphasis on feature engineering to bring diverse data formats into a relational data format for algorithmic implementation. 6. **Variable Techniques in Anomaly Detection:** - Anomaly detection algorithms vary in their minimum data point requirements; it depends on the specific algorithm in use. - Anomaly detection is a subset of unsupervised learning and falls under the broader category of supervised learning. 7. **Combination of Techniques:** - While the text doesn't explicitly mention combining clustering, anomaly detection, and recommender systems, it suggests that these are modular approaches that can be combined based on the problem at hand. - These techniques serve as "Lego blocks" that can be assembled to create solutions tailored to specific real-world challenges. 8. **Speaker Diarization in Audio:** - Speaker diarization is a common use case for clustering in audio data. - It involves separating speakers in an audio recording, such as identifying different speakers in a movie dialogue. 9. **Evolution in Data Science and AI:** - The text emphasizes the shift from a "com data view" to a "computational view" using feature engineering, making algorithms applicable to diverse data formats. **Key Points on Use Cases:** - Close captioning or subtitles can be enhanced by* clustering audio waves based on speaker identity*. - Clustering audio segments can be used for attribution or [[Diarization]], identifying who is speaking in a recording. - Clustering is applicable to music songs for *genre identification and recommendation systems*. - Podcasts can be clustered based on content for recommendation purposes. - Speech samples can be clustered to identify accents or dialects in audio signals. - Health applications include *tracking vocal characteristic changes over time* for early detection of conditions like Parkinson's. #### Summary of Slide Deck [AI Capabilities](https://cdn.upgrad.com/uploads/production/2a053cf4-2d33-42f9-93d6-1e378ced402d/3_AComputerScientistsGuideToAICapabilities_Part1.pdf) Dr. Praphul Chandra provides an insightful overview of foundational concepts in artificial intelligence, focusing on the underpinnings of [[Generalized Pre-trained Transformers (GPTs)]]. The text delves into the nature of data, emphasizing its binary representation in digital computers. The discussion then shifts to [[Feature Engineering]], a crucial aspect of machine learning, with a focus on processing and converting diverse data types like text, images, and audio into a common relational format. The text explores text features such as [[Bag-of-Words]], [[TF-IDF]], and [[Sentiment Analysis]], as well as image and audio features like pixels, histograms, and audio characteristics. A comprehensive view of the relational data model is presented, encompassing data frames, matrices, spreadsheets, and database tables. The large-scale nature of data, both in terms of volume ([[Big Data]]) and dimensionality, is acknowledged, with a specific mention of high-dimensional data like text, images, and video. The computational perspective on machine learning is elucidated, distinguishing between unsupervised learning and supervised learning. [[Unsupervised Learning]] involves tasks like clustering, anomaly detection, and recommender systems, while [[Supervised Learning]] focuses on function approximation, regression, and classification. The text concludes with a discussion on text, image, and audio clustering, showcasing applications in various domains. Additionally, supervised learning is explored through [[Regression]] and [[Classification Problems|Classification]] tasks, with practical applications highlighted, including face recognition and speaker recognition. #### Text Features * [[Bag-of-Words]] | Frequency of each word. Ignore grammar, word order. * [[TF-IDF]] | Frequency of each word relative to its frequency across a corpus * [[N-Grams]] | Use n-grams instead of words. Attempt to capture word order * [[PoS Tags]] | Grammatical category of each word i.e. noun, verb, adjective * [[NER Tags]] | Identify entities in the document e.g. names, locations, orgs * [[Sentiment Analysis]] | Sentiment scores based on word-sentiment lexicons * [[Readability Score]] | Measures of text complexity & readability * [[Word Embeddings]] | Represent words as dense vectors in a semantic space #### Image Features * Pixels | Raw pixel representation. Common in traditional CV * [[Colour Histogram]] | Represent the distribution of colours in an image * HoG | [[Histogram of Gradients (HoG)]] capture local gradients in image intensity * SIFT | [[Scale Invariant Features (SIFT)]] robust to scale, rotation, illumination changes * BoVW | [[Bag of Visual Words (BoVW)]]: histogram of visual words obtained from clustering local image features. * Region based CNN features | Derived by a DL network w/ multi-layer transformation * [[Feature Pyramids]] | Hierarchical representations of features at different scales #### Audio Features * [[MFCC]] | Power spectrum ~ different frequencies per human ear sensitivity * [[Chroma]] | Energy distribution of pitches; Useful for music analysis * [[Zero Crossing Rate]] | rate at which signal changes sign; characterizes percussiveness * [[RMS]] | Measures intensity or power of an audio signal; characterizes loudness * [[Formants]] | Resonant frequencies in the vocal tract. Crucial for speech analysis, speaker recognition, gender classification. * [[Fundamental Frequency]] | Rate at which vocal cords vibrate. Crucial for speaker characterization. * [[Time-Frequency Representations]] | Time + Frequency representation e.g. wavelets, ST Fourier Transform #### Unsupervised Learning * **Clustering**, Finding “areas” in space where data is concentrated. * **Anomaly Detection**, Data points left-over after clustering. * **Recommender Systems**, Recommend high-predicted-value items–to-users, users-to-items * Predict missing values based on user-user similarity * Predict missing values based on item-item similarity * Predict missing values using matrix completion (latent space) * **Text Clustering** * Clustering news articles to organize them by topic, improving UX * Clustering product reviews to identify common issues, feedbacks * Clustering tweets to identify trending topics or discussions * Clustering legal documents for case analysis, legal research * Clustering customer support tickets to identify common issues * **Images, Video Clustering** * Clustering images in storage to aid similar image retrieval * Clustering *segments in satellite images to identify land use patterns, monitor environmental changes, and detect natural disasters*. * Clustering similar segments within images to aid object recognition * Clustering video frames to recognize and categorize human activities in surveillance, sports analysis etc. * **Audio Clustering** * Clustering audio into segments based on speaker identity i.e. Speaker Diarization * Clustering audio tracks based on acoustic features to bucket them in genres. * Clustering podcasts based on content to aid content recommendation * Clustering speech samples to recognize different accents or dialects. * Clustering speech patterns to monitor changes in vocal characteristics e.g. early detection of Parkinson's disease. #### Supervised Learning * Find how the value of the dependent variable depends on the value of others * Find how the outcome is related to the features * Find how the output depends on the input