A quiz I did for the ddj track.
The following procedure has been developed to provide a stable URL collection method than can be coupled with text extraction and text burning features. The method was used for this research proposal for SocialScienceOne. The main goal of the workflow is to generate a corpus of URLs related to online media content for all Swiss referanda from June, 2017 to March, 2018, including their text body. The starting point for the URL collection is a list of the top Google searches for each initiative prior to the respective ballots. From there, the algorithm accesses all URLs and extracts all potential links to other informational content on the referendum, this is hyperreferences indluded on those pages. To avoid including links to web-ads or other unrelated content we applied a keyword filter to the newly scraped URLs. The new URLs are then matched to the ones from the initial collection. Only the new entries are then accessed in turn, to extract all possible references, and so on, until there is no new reference or a time limit has been reached.
This short contribution shall be an example for what one can expect form following articles in this project. The goal of this paper is to visualize the similarity between the five biggest parties within the swiss parliament and six well known and influential organizations. The underlying question to this article is: “Do we see similarities between actors which are commonly known as political close to each other or not?” Each actors’ official press release statements from their homepage were used to build a text corpus containing around 7’400 press releases from 2010 to April 2018. The results are quite interesting as they show some surprising similarities.
subscribe via RSS