Firmenlogo

Intern (f/m/d) Law - Framework conditions & risks of web scraping or similar (Jena) bei Deutsches Zentrum für Luft- und Raumfahrt

Deutsches Zentrum für Luft- und Raumfahrt · Jena, Deutschland · Onsite

Jetzt bewerben

At the Institute of Data Science in Jena, we are working on making the data backbone a reality for all DLR application areas (aviation, space, energy, transportation, security). To this end, we develop and research methods in interdisciplinary work with a focus on applications such as sustainable and circular processes, resilient supply chains, data-driven value chains or robust decision support. The methods developed in this way are applied in cooperation with other DLR institutes and external partners, either as part of joint projects or as part of technology transfer activities.

 

What to expect

The [email protected] project is developing a DLR search infrastructure consisting of modular components for the acquisition, processing and enrichment of web and research metadata. At the same time, it will integrate new technologies and innovations in the field of internet search and information management. This includes internal and external distributed web crawling, connection to specialised databases, for example in earth observation or research data management, as well as new methods for automatic information retrieval in very large data sets through the use of artificial intelligence (AI) methods.

 

Your tasks

To enrich the metadata of websites, the project classifies them based on text content. While pre-trained models for classification into broad categories, such as news or science, are already available, a fine-grained categorisation of websites with thematically similar content is also required. A concept for clustering websites has already been developed for this purpose, which will be further developed as part of the internship programme.

  • Conceptualisation, implementation and evaluation of a flexible classification approach for unsupervised fine-grained classification of websites within a thematic domain
  • Use of software for focussed crawling of websites
  • Analysing the data in terms of information content, for example in the context of critical infrastructures and extreme weather events
  • Documentation of the implemented software and the results

 

Your profile

  • Ongoing studies in computer science, data science or mathematics
  • Planned compulsory internship or final thesis
  • Very good knowledge of Python
  • Quick comprehension with a goal-oriented and independent way of working

We look forward to getting to know you!

If you have any questions about this position (Vacancy-ID 2824) please contact:

Jens Kersten 
Tel.: +49 3641 30960 122 

Jetzt bewerben

Weitere Jobs