Microsoft Podcast Copilot

In this blog, we’re going to talk about an exciting code demonstration that happened at the Build 2023 event. Microsoft’s Chief Technology Officer, Kevin Scott, showcased the architecture of a fantastic automated social media content creation and posting tool called Podcast Copilot.

Before we dive into the details, let’s first mention that Kevin Scott is also the host of a podcast called “Behind the Tech.” Now, imagine you want to create a social media post to promote a new episode of the podcast. Well, the Podcast Copilot is here to make your life easier! It uses some really cool machine learning models working together to help you with that.

Here’s how it all works:

  1. First, we have the Whisper model. It takes the audio file of the podcast and converts it into written text, giving us a transcript of the podcast.
  2. Next, we have the Dolly 2 model. It analyzes the transcript and extracts the name of the guest who appeared on the podcast. Pretty neat, right?
  3. Now, we need a bio for the guest. This is where the Bing Search Grounding API comes in. It searches the internet and retrieves a biography for the guest.
  4. With the transcript and the guest’s bio in hand, the powerful GPT-4 model steps in. It generates a captivating social media post that promotes the podcast episode.
  5. But we’re not done yet! We use GPT-4 once again, this time to create a relevant prompt for DALL-E, another awesome model. DALL-E then generates a cool image to accompany the social media post.
  6. Before posting, you get a chance to review the content. If everything looks good, a LinkedIn plugin is used to post the social media copy and image to LinkedIn.

It’s important to note that during the demo, Whisper and Dolly 2 were run locally. However, the Bing Search Grounding API is available on Azure. Additionally, GPT-4, DALL-E 2, and the plugins-capable model were deployed using the Azure OpenAI service.

Just a quick heads-up: as of the Build event in May 2023, the DALL-E models are still in private preview. If you want access to the DALL-E model for image generation, you need to request it through the form at https://aka.ms/oai/access. In question #22 of the form, you can ask for access to the DALL-E models. Also, the plugins-capable models are not yet available to the public, but stay tuned for updates! Below is the social media post Kevin Scott created from this project.

Setup

This project requires creating an Azure OpenAI resource to run several cloud-based models.

You can find the GitHub repository here: https://github.com/microsoft/PodcastCopilot

You will also need to create a Bing search resource at https://portal.azure.com/#create/Microsoft.BingSearch.

Next, update the PodcastSocialMediaCopilot.py file with your settings.

  • Update bing_subscription_key with the API key of your Bing resource on Azure.
  • Update openai_api_base with the name of your Azure OpenAI resource; this value should look like this: “https://YOUR_AOAI_RESOURCE_NAME.openai.azure.com/
  • Update openai_api_key with the corresponding API key for your Azure OpenAI resource.
  • Update gpt4_deployment_name with the name of your model deployment for GPT-4 in your Azure OpenAI resource.
  • If your model deployments for gpt-4, dalle, and the plugins-capable model are all on the same Azure OpenAI resource, you’re all set! If not, you can override the individual endpoints and keys for the resources for the various model deployments using the variables gpt4_endpointgpt4_api_keydalle_endpointdalle_api_keyplugin_model_url, and plugin_model_api_key.
  • Optionally, you can also update the podcast_url and podcast_audio_file to reflect your own podcast.

Finally, set up your environment and run the code using the following commands:

pip install -r requirements.txt
python PodcastSocialMediaCopilot.py

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.