Scraping data from PDF using R
2
I'd like to extract data (ski jumpping) from this PDF http://medias4.fis-ski.com/pdf/2019/JP/3088/2019JP3088RL.pdf I'm interested in every data except bib , club and date of birth I was trying with pdftools library pdf_text("raw/data.pdf") %>% strsplit(split = "n") and I stuck here. The problem is that column points (gate compensation) sometimes is empty and sometimes it's not. I don't know how to handle that. My desired output is something like that: Rank|Athlete |Nation|(...)|Jump_1|Round_1|Jump_2|Round_2|Tot_points 1 |KLIMOV Evgeniy|RUS |(...)|127.5 |130 |131.5 |133.4 |263.4 Anyone may help me?
r pdf web-scraping screen-scraping
sha