How Does Copilot Work?
Microsoft 365 Copilot is an advanced AI-driven assistant integrated within most aspects of the Microsoft infrastructure. It is designed to enhance user productivity and streamline workflows whether it be in Word, Outlook or PowerPoint. It is the latest intelligent virtual assistant that will introduce the power of GenAI and literally put a ChatGPT like prompt in your Microsoft Office documents. It is a key service that is being heavily promoted by Microsoft.
But how can we discover Copilot interactions? What do they look like and are there any key differences that we need to know? This article discusses how Copilot is preserved by Microsoft Purview, the data format, and best practices on how it can be collected.
This work has been done with the help of ProSearch’s Microsoft 365 Advisory services who created the datasets needed for testing.
How does Copilot work?
Copilot is an additional service available in Microsoft 365. Once licensed, it is represented in every Microsoft 365 application (Word, Excel, PowerPoint, Teams and Outlook) as a contextual menu beside the cursor or as part of the application’s Ribbon interface. Clicking the Copilot icon opens a pop-up window or a document side bar. Both contain an input box allowing prompts to be made in the same you would do in ChatGPT. In Teams, the prompt window is literally a conversation with the Copilot bot.

Copilot in Word, as seen as a Contextual Prompt and side bar
Both prompt entry points work in the same way. They allow inputting of AI prompts and supporting files with the results being written into the document or made as a reply in a conversation in the sidebar. Answers may come from the general LLM or fetch information from the local Microsoft 365 tenant. A prompt in Copilot in Word like “Summarise my last email conversation” will result in Copilot fetching that email from Outlook and summarising appropriately. A prompt like “What is WorkStream” in PowerPoint, will result in Copilot fetching that data from and linking to documents found in SharePoint.

Copilot in Word pulling in documents from ProSearch Way to answer about WorkStream
A conversation with Copilot, therefore, may have more than just text associated with it. It may have associated linked files or URLs. Prompts may generate entire PPT presentations, images or other content. There is so much more to this and interested parties can read more on the Microsoft Copilot site.
Exports
Copilot interactions and outputs are preserved and can be found via Purview eDiscovery Standard or Premium. Copilot interactions are stored in the mailbox of the user that interacted with the system regardless of if the interaction was made in Outlook, Excel, PowerPoint or Word. As such, collecting the data follows a very similar process to that of Teams or Email collections. The only thing that changes is the datatype category one needs to select when creating the collection.
To direct a collection to extract only Copilot data simply select Type equals any of “Copilot interactions” during the collection process. Conversely, if Copilot interactions are out of scope of the collection then the Type can be changed to the inverse NOT operator.
Format
Even though Copilot has its own collections category (Copilot interactions), the underlying data looks very similar to Teams data. Purview treats interactions as conversations between the user and the AI system. Much like Teams data, if the tenant has an E3 discovery licence then each interaction will be exported as a separate MSG and bundled in a PST mailbox. If the tenant has an E5 license, then the interactions can be exported as HTML transcripts. At first glance, Copilot data appears like regular Teams dataset, however tests have shown that there are important differences between the two. The following is a small non exhaustive list of difference that we have identified so far.
MSG Format
If exports are done in a PST, then Copilot interactions are stored as individual MSGs in the TeamsMessageData PST folder. This is the same location that Teams chats are stored. However, the Copilot MSGs are not of the same format as the Teams data counterpart. Unlike Teams data, Copilot responses are stored as HTML attachments within that MSGs, not in the email body as is the case for normal Teams data. The message body of Copilot MSGs is largely blank.
An example of this situation is shown below. The images shows a user prompt and a Copilot response. Both are stored as individual email. As can be seen, the Copilot response contains an attachment Microsoft 365 chat.html. The actual Copilot result is found in this file and is illustrated in the final image.
HTML Format
In E5 environments, Copilot data is exported as HTMLs with an associated CSV of metadata that describes the data in the delivery. On the face of it, the Copilot HTMLs look exactly like Teams transcripts with interactions cascading linearly in an analogous way to how they were generated initially. Below is the HTML transcript of the same conversation as the section above. Note that the HTML support Unicode characters.

The underlying HTML format of the Copilot messages is different to that of the Teams counterpart however. Key HTML blocks, like the hidden CDATA tags, that provide vital metadata are not present for the Copilot messages but are found for user created messages.
Another difference is how the Copilot HTMLs are descrived in the accompanying CSV. Copilot HTMLs have the Item_class value: IPM.SkypeTeams.Message.Copilot.<appName> where AppName is the application in which the interaction happened. Examples include: IPM.SkypeTeams.Message.Copilot.PowerPoint and IPM.SkypeTeams.Message.Copilot.Word. or in the case of Teams IPM.SkypeTeams.Message.Copilot.BizChat. A comprehensive list can be found on the Microsoft site.
Response Times
The Copilot participant in HTML conversations is named according to the which Copilot application was used. A participant will be called Copilot in PowerPoint for example if the Copilot was used in PowerPoint. Conversation with the Copilot in Teams will have Microsoft 365 Chat as participants etc.
The Copilot participant response times are often immediate and may have the EXACT same timestamp as the prompt. Processing tools that parse and convert the data to other formats such as RSMFs, as is done by Prosearch, may be presented with an ordering problem. One may see an RSMF with a Copilot answer found after the prompt, due to them having the same date and time. Care must be taken to ensure that the second and millisecond values are taken into account to preserve the order of the message.
Conclusions
Copilot is a flagship Microsoft service being deployed in almost all areas of the Microsoft infrastructure. The popularity of GenAI and LLMs will mean that it will inevitably be utilised for day-to-day use. As such it will be found in eDiscovery deliveries in the near futures. The ability to handle that data accurately and efficiently is an important challenge for the eDiscovery community.
Tests show that Copilot data, although having an appearance of a Teams dataset, has very important distinctions that set it apart from the Teams data format. It is recommended that clients collect Copilot data separately to their other collections to allow the processing team to treat it accordingly. Interested parties should contact the M365 Advisory Services for more information.

Damir Kahvedžić is a technology expert specializing in providing clients with technical assistance in eDiscovery and Forensics cases. He has a PhD in Cybercrime and Digital Forensics Investigations from the Centre for Cybercrime Investigation in UCD and holds a first-class Honours B.Sc in Computer Science. Experienced in the use of industry leading software, such as Relativity, EnCase, NUIX, Cellebrite, Clearwell, and Brainspace, Damir is also a PRINCE2 and PECB ISO 21500 qualified project manager. Damir has published both academic and technical papers at several international conferences and journals including the European Academy of Law, Digital Forensic Research Workshop (DFRWS), Journal of Digital Forensics and Law amongst others.





