Extractive compression with LLM using BERT

In today’s fast-paced world, we are bombarded with more information than we can handle. We are getting more and more used to receiving more information in less time, which leads to frustration when we have to read extensive documents or books. This is where extractive compression comes into play. To get to the heart of the text, the process extracts key sentences from an article, article or page to give us a snapshot of the most important points.

For anyone who needs to understand large documents without reading every word, this is a game changer.

In this article, we explore the basics and applications of extractive compression. We will examine the role of large language models, especially BERT (Bidirectional Encoder Representations from Transformers), in process improvement. The article will also include practical instructions on using BERT for extractive summarization, demonstrating its practicality in condensing large volumes of text into informative summaries.

Understanding extractive compression

Extractive summarization is a prominent technique in the field of natural language processing (NLP) and text analysis. With it, key sentences or phrases are carefully selected from the source text and combined to create a concise and informative summary. This involves carefully sifting through the text to identify the most important elements and central ideas or arguments presented in the selected work.

Where abstract summarization involves generating entirely new sentences that are often not present in the source material, extractive summarization stays with the original text. It does not alter or paraphrase, but extracts sentences exactly as they appear, retaining the original text and structure. In this way, the summary remains faithful to the tone and content of the original material. The extractive summarization technique is extremely useful in cases where accuracy of information and preservation of the author’s original intent are priorities.

It has many different uses, such as summarizing newspaper articles, academic papers or long reports. The process effectively conveys the message of the original content without potential biases or reinterpretations that can occur through paraphrasing.

How does extractive summarization use LLM?

1. Parsing the text

This initial step involves breaking down the text into essential elements, primarily sentences and phrases. The goal is to identify basic units (sentences, in this context) that the algorithm will later evaluate to include in the summary, like dissecting a text to understand its structure and individual components.

For example, the model would analyze a four-sentence paragraph by breaking it down into the following four-sentence components.

  1. The Pyramids of Giza, built in ancient Egypt, stood majestically for millennia.
  2. They were built as tombs for pharaohs.
  3. The Great Pyramids are the most famous.
  4. These structures symbolize architectural intelligence.

2. Feature extraction

At this stage, the algorithm analyzes each sentence to identify characteristics or ‘features’ that might indicate their significance to the overall text. Common features include the frequency and repetition of keywords and phrases, the length of sentences, their position in the text and its implications, and the presence of specific keywords or phrases that are central to the main theme of the text.

Below is an example of how LLM would perform feature extraction for the first sentence: “The Pyramids of Giza, built in ancient Egypt, have stood majestically for millennia.”

Feature extraction

3. Scoring sentences

Each sentence is scored based on content. This score reflects the perceived importance of the sentence in the context of the entire text. Sentences with a higher score are considered to have more weight or relevance.

Simply put, this procedure rates each sentence according to its potential significance for summarizing the entire text.

Scoring sentences

4. Selection and aggregation

The final stage includes selecting the sentences that received the most points and compiling them into a summary. When done carefully, this ensures that the summary remains coherent and represents the main ideas and themes of the original text as a whole.

To create an effective summary, the algorithm must balance the need to include important sentences that are summarized, avoid redundancy, and ensure that the selected sentences provide a clear and comprehensive overview of the entire source text.

  • The Pyramids of Giza, built in ancient Egypt, stood majestically for millennia. They were built as tombs for pharaohs. These structures symbolize architectural splendor.

This example is extremely simple, singles out 3 out of 4 sentences for the best overall summary. Reading an extra sentence doesn’t hurt, but what happens when the text is longer? Say, 3 paragraphs?

How to run extractive compression with BERT LLM

Step 1: Installing and importing required packages

We will use the previously trained BERT model. However, we will not use just any BERT model; instead, we’ll focus on the BERT Extractive Summarizer. This particular model is fine-tuned for specialized tasks in extractive compression.

!pip install bert-extractive-summarizer
from summarizer import Summarizer

Step 2

The Summarizer() a function imported from Python’s digest tool is an extractive text digest tool. It uses the BERT model to analyze and extract key sentences from a larger text. This feature aims to retain the most important information, providing a condensed version of the original content. It is usually used to efficiently condense long documents.

model = Summarizer()

Step 3: Importing our text

Here we will import any piece of text we want to test our model on. To test our extractive summary model, we generated text using ChatGPT 3.5 with the prompt: “Provide a 3-paragraph summary of the history of GPUs and how they are used today.”

text = "The history of Graphics Processing Units (GPUs) dates back to the early 1980s when companies like IBM and Texas Instruments developed specialized graphics accelerators for rendering images and improving overall graphical performance. However, it was not until the late 1990s and early 2000s that GPUs gained prominence with the advent of 3D gaming and multimedia applications. NVIDIA's GeForce 256, released in 1999, is often considered the first GPU, as it integrated both 2D and 3D acceleration on a single chip. ATI (later acquired by AMD) also played a significant role in the development of GPUs during this period. The parallel architecture of GPUs, with thousands of cores, allows them to handle multiple computations simultaneously, making them well-suited for tasks that require massive parallelism. Today, GPUs have evolved far beyond their original graphics-centric purpose, now widely used for parallel processing tasks in various fields, such as scientific simulations, artificial intelligence, and machine learning.  Industries like finance, healthcare, and automotive engineering leverage GPUs for complex data analysis, medical imaging, and autonomous vehicle development, showcasing their versatility beyond traditional graphical applications. With advancements in technology, modern GPUs continue to push the boundaries of computational power, enabling breakthroughs in diverse fields through parallel computing. GPUs also remain integral to the gaming industry, providing immersive and realistic graphics for video games where high-performance GPUs enhance visual experiences and support demanding game graphics. As technology progresses, GPUs are expected to play an even more critical role in shaping the future of computing."

Here’s that text without it inside the code block:

“The history of graphics processing units (GPUs) dates back to the early 1980s when companies like IBM and Texas Instruments developed specialized graphics accelerators to render images and improve overall graphics performance. However, it wasn’t until the late 1990s and early 2000s that ih GPUs gained prominence with the advent of 3D gaming and multimedia applications. NVIDIA’s GeForce 256, released in 1999, is often considered the first GPU, as it integrated both 2D and 3D acceleration on a single chip. ATI (later acquired by by AMD) also played a significant role in GPU development during this period.

The parallel architecture of GPUs, with thousands of cores, allows them to handle multiple computations simultaneously, making them suitable for tasks that require massive parallelism. Today, GPUs have evolved far beyond their original graphics-focused purpose and are now widely used for parallel processing tasks in a variety of fields, such as scientific simulations, artificial intelligence, and machine learning. Industries such as finance, healthcare, and automotive engineering are leveraging GPUs for complex data analysis, medical imaging, and autonomous vehicle development, demonstrating their versatility beyond traditional graphics applications.

With advances in technology, modern GPUs continue to push the boundaries of computing power, enabling breakthroughs in various fields through parallel computing. GPUs also remain an integral part of the gaming industry, providing immersive and realistic graphics for video games where high-performance GPUs enhance visual experiences and support demanding game graphics. As technology advances, GPUs are expected to play an even more critical role in shaping the future of computing.”

Step 4: Perform extractive compression

Finally, we will execute our summarization function. This function requires two inputs: the text you want to summarize and the desired number of sentences to summarize. After processing, it will generate an extractive summary which we will then display.

# Specifying the number of sentences in the summary
summary = model(text, num_sentences=4) 
print(summary)

Extractive summary output

The history of graphics processing units (GPUs) dates back to the early 1980s when companies such as IBM and Texas Instruments developed specialized graphics accelerators to render images and improve overall graphics performance. NVIDIA’s GeForce 256, released in 1999, is often considered the first GPU, as it integrated 2D and 3D acceleration on a single chip. Today, GPUs have evolved far beyond their original graphics-focused purpose and are now widely used for parallel processing tasks in a variety of fields, such as scientific simulations, artificial intelligence, and machine learning. As technology advances, GPUs are expected to play an even more critical role in shaping the future of computing.

Our model extracted the 4 most important sentences from our large text corpus to generate this summary!

Challenges of extractive summarization using LLM

Limitations of contextual understanding

While LLMs are adept at processing and generating language, their understanding of context, especially in longer texts, is limited. LLMs may miss subtle nuances or fail to recognize critical aspects of the text, leading to less accurate or relevant summaries. The more advanced the language model, the better the summary will be.

Bias in training data

LLMs learn from vast data sets collected from a variety of sources, including the Internet. These datasets may contain biases that the models could inadvertently learn and replicate in their summaries, leading to distorted or unfair representations.

Handling of specialized or technical language

While LLMs are generally trained on a wide range of general texts, they may not accurately capture the specialized or technical language of fields such as law, medicine or other highly technical fields. This can be mitigated by adding more specialized and technical text. Lack of training in specialized jargon can affect the quality of abstracts when used in these areas.

Conclusion

Clearly, extractive compression is more than just a practical tool; it is a growing need in our information-saturated age, in which we are inundated with walls of text every day. By harnessing the power of technologies like BERT, we can see how complex texts can be distilled into digestible summaries, saving us time and helping us further understand the texts being condensed.

Whether for academic research, business insights, or just for information in a technologically advanced world, extractive summarization is a practical way to navigate the sea of ​​information that surrounds us. As natural language processing continues to evolve, tools like extractive summarization will become even more essential, helping us quickly find and understand the information that matters most in a world where every minute counts.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *