Hello,

I want to create an AI model to learn about AI/ML. so I have scraped some data from Threads and Instagram.now I am wondering how can I use this dataset to make an AI model or do something useful with it? (BTW I don’t know anything about AI/ML. I have done internship as Data Analyst so I know a little bit about Linear regression etc. but don’t know anything advance.)

I am really curious to explore this space :)

  • lurch (he/him)@sh.itjust.works
    link
    fedilink
    arrow-up
    8
    ·
    8 months ago

    Creating an AI Model: A Beginner’s Guide

    Introduction

    Creating an AI model involves several steps, especially if you’re new to the field. Let’s break down the process into actionable steps:

    1. Data Preprocessing:

      • Clean and preprocess your dataset.
      • Handle missing values, duplicates, and format the data appropriately.
    2. Define Your Problem:

      • Decide what task your AI model should perform (classification, regression, etc.).
      • Collect labeled data if needed (e.g., sentiment analysis).
    3. Choose an AI/ML Approach:

      • Start with simpler models before diving into deep learning.
      • Common approaches:
        • Linear Regression: Predict continuous values.
        • Classification: Assign labels to data points.
        • Clustering: Group similar data points.
        • Decision Trees: Simple yet powerful.
        • Random Forests: Ensemble of decision trees.
        • Neural Networks: Deep learning models.
    4. Feature Engineering:

      • Extract relevant features from your data.
      • Use techniques like TF-IDF or word embeddings for text data.
      • For images, consider pre-trained CNNs.
    5. Split Your Data:

      • Divide your dataset into training and validation/test sets.
    6. Train Your Model:

      • Use libraries like Scikit-Learn (for traditional ML) or TensorFlow/Keras (for deep learning).
      • Start with a simple model and iterate.
    7. Evaluate and Tune:

      • Use appropriate evaluation metrics (accuracy, precision, recall, F1-score, etc.).
      • If performance is low, consider hyperparameter tuning.
    8. Deployment:

      • Deploy your model (web app, API, etc.).
    9. Learn Continuously:

      • AI/ML is evolving; keep learning and stay updated.

    Remember, patience and persistence are key! Start small, learn, and gradually build your expertise. Good luck! 😊


    If you have any specific questions or need further guidance, feel free to ask! 🚀

    For additional resources, explore tutorials and videos on web scraping and AI model training. Happy learning! 🌟

    : Web scraping and AI model training: Microsoft Learn : Building custom models with AI Builder: Microsoft Learn : Web scraping for data models: Towards Data Science

  • simplymath@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    8 months ago

    I would ignore the people who say you should deploy a model from someone else as that will teach you next to nothing about how this stuff works.

    I would start with an older model and framework (e.g. scikitlearn) and go through all the processing, prediction, and evaluation steps using a model that’s fairly simple to understand. Since you already know about linear regression, start with some of these linear models.

    Then, and only then, would I worry about neural networks and deep learning, since the main difference is a non-linear activation function and a much more complicated set of weights (model parameters in the linear regression language).

    Here is an example

    Source: PhD in neural networks

    • andrew0@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      8 months ago

      You’re right. I read past the “I want to learn ML” and went straight to “do something useful with the data”.

      If the goal is to understand how modern LLMs work, it’s also good to read up on RNNs and LSTMs. For this, 3Blue1Brown does an amazing job, and even posted an in-depth video about transformers. I’d watch that next, followed by implementing a simple transformer in PyTorch (perhaps using the existing blocks).

      You could argue that it’s important to design everything from scratch first, but it’s easier to first go high level, see how the network behaves, and then attempt to implement it yourself based on the paper. It is up to OP how comfortable he is with the topic though 😁

  • LeroyJenkins@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    8 months ago

    That’s a great starting point! Your scraped data from Threads and Instagram can be a valuable resource for exploring AI/ML. Here’s a general roadmap to get you started:

    • Understand Your Data: Before diving into AI/ML models, it’s crucial to understand your data. Analyze the content you scraped from Threads and Instagram. What format is it in (text, images, videos)? What kind of information does it contain (captions, comments, user data)?

    • Choose an AI/ML Approach: Based on your data and goals, you can explore different AI/ML techniques. Here are some options to consider:

      • Text Analysis: If your data is text-heavy, you can use natural language processing (NLP) to analyze sentiment, topics, or emerging trends. -Image Recognition: If you have a lot of images, you can use computer vision to identify objects, scenes, or classify images based on their content.
    • Start Simple: Begin with well-established algorithms like linear regression or decision trees. These can provide valuable insights without requiring deep learning expertise.

    • Utilize Online Resources: There are plenty of online tutorials and courses that can introduce you to AI/ML concepts. Platforms like Google Colab offer free computing resources to experiment with code. Remember, this is an ongoing learning journey. Start with small steps, explore different resources, and don’t be afraid to experiment!

  • andrew0@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    3
    ·
    8 months ago

    Depending on how much compute you have available, you can look into finetuning models from HuggingFace (e.g. Llama 3, or a smaller Phi model). Look into LoRA, and try to learn how the model you choose calculates the loss.

    There are various ways to train, and usually involves masking the input by replacing random input tokens with the mask token. I won’t go into too much detail with this, because it’s a lot to explain, and I suggest you read an article on this (link1 or link2)