Summarizing text using deep learning

With the evolution of the internet, we have been flooded with a lot of voluminous data from various sources such as news articles, social media platforms, blogs, and so on. Text summarization in the field of natural language processing is the technique of creating a concise and accurate summary of textual data, capturing the essential details that are coherent with the source text.

Text summarization can be of two types, which are as follows:

  • Extractive summarization: This method extracts key sentences or phrases from the original source text without modifying them. This approach is simpler.
  • Abstractive summarization: This method, on the other hand, works on a complex mapping between the context of the source text and the summary rather than merely copying words from the input to the output. A significant challenge with this approach is that it requires a lot of data for training the model so that the machine-generated summaries are equivalent to human-generated summaries.

The encoder-decoder LSTM architecture has been proven to work efficiently for addressing sequence-to-sequence problems and is capable of working with multiple inputs and outputs. In this recipe, we will slightly modify the standard one-hot encoder-decoder architecture that we used in the previous recipe by inducing a technique known as teacher forcing. Teacher forcing is a strategy that's often used for training recursive networks and uses the model output from a prior time step as the input in the next time step. In this architecture, the encoder takes the input text and converts it into an internal representation of a fixed length, capturing the context of the input text appropriately. The decoder uses the internal representation generated by the encoder, as well as the sequence of words or phrases that have already been generated, as a summary. Thus, in this architecture, the decoder has the flexibility to utilize the distributed representation of all the words that have been generated so far as an input to predict the next word.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset