tag:blogger.com,1999:blog-178895882024-03-13T17:05:00.314+01:00chem-bla-icsEgon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.comBlogger1355125tag:blogger.com,1999:blog-17889588.post-89162536061939235522023-08-18T08:39:00.001+02:002023-08-18T08:39:27.362+02:00Last post here / the Freebie model online<p>This is my last post on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of blogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain: <a href="https://chem-bla-ics.linkedchemistry.info/">https://chem-bla-ics.linkedchemistry.info/</a></p><p>I, like so many others, struggle with choosing open infrastructure versus the <a href="https://en.wikipedia.org/wiki/Product_sample">freebie model</a>. Of course, we know these things come and go. Google Reader, FriendFeed, Twitter/X (see doi:<a href="https://doi.org/10.1038/d41586-023-02554-0">10.1038/d41586-023-02554-0</a>). My new blog is still using the freebie model: I am hosting it on GitHub. But following the advice from a fellow cheminformatician, I now front this with a owned domain name.</p><p>See you at linkedchemistry.info!</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-88947690808488682852023-08-12T12:40:00.002+02:002023-08-12T12:40:33.722+02:00Boiling points in Wikidata<p>Some days ago, I started added boiling points to <a href="https://wikidata.org/">Wikidata</a>, referenced from <a href="https://scholia.toolforge.org/work/Q22236188">Basic Laboratory and Industrial Chemicals</a> (wikidata:Q22236188), <a href="https://scholia.toolforge.org/author/Q18609741">David R. Lide</a>'s 'a CRC quick reference handbook' from 1993 (well, the edition I have). But Wikidata <a href="https://www.wikidata.org/wiki/User_talk:Egon_Willighagen#Basic_laboratory_and_industrial_chemicals:_a_CRC_quick_reference_handbook_(Q22236188)">wants</a> pressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet, because it slows me and can be automated with <a href="https://quickstatements.toolforge.org/">QuickStatements</a>.</p><p>I just need a few SPARQL queries to list to which statements the qualifiers needs to be added. Basically, all boiling points which has the book as a reference and that do not have the pressure info. First, there are values with 'unknown value', which results in blank nodes (by the time you read this, they likely are already fixed):</p><div style="text-align: left;"><span style="font-family: courier;">SELECT ?cmp ?bp ?pressure WHERE {<br /></span><span style="font-family: courier;"> ?cmp p:P2102 ?bpStatement .<br /></span><span style="font-family: courier;"> ?bpStatement prov:wasDerivedFrom/pr:P248 wd:Q22236188 ;<br /></span><span style="font-family: courier;"> ps:P2102 ?bp .<br /></span><span style="font-family: courier;"> ?bpStatement pq:P2077 ?pressure .<br /></span><span style="font-family: courier;"> FILTER (contains(str(?pressure), "http://"))<br /></span><span style="font-family: courier;">}</span></div><p>So, to get the list for which I want to write the QuickStatements which does not have any P2077 qualifier yet, I use <a href="https://query.wikidata.org/#SELECT%20%3Fcmp%20WHERE%20%7B%0A%20%20%3Fcmp%20p%3AP2102%20%3FbpStatement%20.%0A%20%20%3FbpStatement%20prov%3AwasDerivedFrom%2Fpr%3AP248%20wd%3AQ22236188%20%3B%0A%20%20%20%20ps%3AP2102%20%3Fbp%20.%0A%20%20MINUS%20%7B%20%3FbpStatement%20pq%3AP2077%20%3Fpressure%20%7D%0A%7D">this query</a>:</p><p><span style="font-family: courier;">SELECT ?cmp WHERE {<br /></span><span style="font-family: courier;"> ?cmp p:P2102 ?bpStatement .<br /></span><span style="font-family: courier;"> ?bpStatement prov:wasDerivedFrom/pr:P248 wd:Q22236188 ;<br /></span><span style="font-family: courier;"> ps:P2102 ?bp .<br /></span><span style="font-family: courier;"> </span><span style="font-family: courier;">MINUS { ?bpStatement pq:P2077 ?pressure }<br /></span><span style="font-family: courier;">}</span></p><p>At the time of writing, this lists 54 boiling points. </p><p>I can the WDQS create CSV-styled QuickStatements with:</p><div style="text-align: left;"><span style="font-family: courier;">SELECT (SUBSTR(STR(?cmp),32) AS ?qid) ?P2102 ?qal2077 WHERE {<br /> ?cmp p:P2102 ?bpStatement .<br /> ?bpStatement prov:wasDerivedFrom/pr:P248 wd:Q22236188 ;<br /> ps:P2102 ?P2102 .<br /> MINUS { ?bpStatement pq:P2077 ?pressure }<br /> BIND ("101.325U21064807" AS ?qal2077)<br />}</span></div><div><br /></div><div>Here, the SPARQL variables double as QuickStatement instructions. Finally, note to use of "U21064807" which is the Wikidata item for kilopascal (wikidata:Q21064807).</div><div><br /></div><div>I also need to "add" the boiling point again, to make sure QuickStatements knows which statement to add the qualifier to. I think this can be done better, but not sure how to target statements directly. This is not fool proof: I noted that this approach ignores the situation where there are two statements with the (exact) same boiling point, but different error margins. But that I will monitor and where needed correct manually.</div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-72181634981585828702023-08-08T08:12:00.006+02:002023-08-08T08:16:07.043+02:00History, provenance, detail<p style="text-align: justify;">Just a quick note: I just love the level of detail <a href="https://www.wikidata.org/">Wikidata</a> allows us to use. One of the marvels is the practices of 'named as', which can be used in statements for subject and objects. The notion and importance here is that things are referred to in different ways, and these properties allows us to link the interpretation with the source. For example, <a href="https://scholia.toolforge.org/author/Q58978">Max Born</a>'s seminal work <a href="https://scholia.toolforge.org/work/Q55867811"><i>Zur Quantenmechanik</i></a> (doi:<a href="https://doi.org/10.1007/BF01328531">10.1007/BF01328531</a>) uses a very short notation to cite other literature, as footnotes, and DOIs did not exist yet.</p><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYgnG1jhj2le5d3XHO_jJ33CoyzXOK9d4nvcj5xFcSViC7AfnZ1FMun_XwEQVwaSHgOVGkFKYEo580FbNYjFI5K7Vx-ZRJg43qLe8o42F_hp1MHtwp5fXD5jUyHwOJDLk_b9ygSlnEoZ0_WCEI_0R_fF8JRq7VYcQGf_sZG4LQ_vEUtwbgKF0C/s679/Screenshot_20230808_080912.png"><img border="0" data-original-height="192" data-original-width="679" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYgnG1jhj2le5d3XHO_jJ33CoyzXOK9d4nvcj5xFcSViC7AfnZ1FMun_XwEQVwaSHgOVGkFKYEo580FbNYjFI5K7Vx-ZRJg43qLe8o42F_hp1MHtwp5fXD5jUyHwOJDLk_b9ygSlnEoZ0_WCEI_0R_fF8JRq7VYcQGf_sZG4LQ_vEUtwbgKF0C/w640-h180/Screenshot_20230808_080912.png" width="640" /></a></div><p>So, in Wikidata, you can <a href="https://www.wikidata.org/wiki/Q55867811#P2860">capture this like this</a>:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVqwWvwbGS_L-nR32ta55SkpXKY1T_n_efySEBIKjvA-oxYzYsX8Dg0yDND1OP-wKD1TQPRpDaSIrR6-WD16KaB3uvcRacEwFgUxbHsYfvfezwqNYCcYn83EXIX6T2woxgJJtIbh1CT_SjsEtBX4xrW66_q1FRDWDAqfxOa7gcvbL0CqvzqlAH/s942/Screenshot_20230808_080433.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="411" data-original-width="942" height="280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVqwWvwbGS_L-nR32ta55SkpXKY1T_n_efySEBIKjvA-oxYzYsX8Dg0yDND1OP-wKD1TQPRpDaSIrR6-WD16KaB3uvcRacEwFgUxbHsYfvfezwqNYCcYn83EXIX6T2woxgJJtIbh1CT_SjsEtBX4xrW66_q1FRDWDAqfxOa7gcvbL0CqvzqlAH/w640-h280/Screenshot_20230808_080433.png" width="640" /></a></div><br /><p><br /></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-69515669303302027242023-08-04T09:36:00.006+02:002023-08-15T12:39:13.902+02:00Blog planets: blogging about Debian, GNOME, Wikimedia, FSFE, and many moreI am still an avid user of <a href="https://en.wikipedia.org/wiki/Category:Web_syndication_formats">RSS/Atom feeds</a>. I use <a href="https://feedly.com/">Feedly</a> daily, partly because of their <a href="https://play.google.com/store/apps/details?id=com.devhd.feedly">easy to use app</a>. My blog is part of <a href="https://planetrdf.com/">Planet RDF</a>, a <i>blog planet</i>. Blog planets aggregate blogs from many people around a certain topic. It's like a forum, but open, free, community driven. It's exactly what the web should be.<div><br /></div><div>It turned out that planets do still exist, so I started a small corner on Wikidata: <a href="https://www.wikidata.org/wiki/Q121134938">Q121134938</a>, and a number of existing blog planets:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://www.wikidata.org/wiki/Special:WhatLinksHere/Q121134938" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="322" data-original-width="411" height="314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_m8HfHlKuiBTj8-vhVax9QrNoiny2-W8lOfPchYpBYwJd9yAvrMZlHFlOs6xEMkfs45WeZiOsDet28kUanhQjkzpB8CuYiJ2K0p2K6XQO6055OsJCuoX0JhIHON3amJPwABXMzYW-x6sFPn5IDUVZSNFU-YYlusS8XNfYKcJw5hiLro-reYCZ/w400-h314/Screenshot_20230804_092520.png" width="400" /></a></div><br /><div>The software used to run these planets is ancient, though. We need a new generation of software, replacing things like <a href="https://en.wikipedia.org/wiki/Planet_(software)">Planet</a>. And I want something people can easily host on GitHub or GitLab Pages or the likes.</div><div><br /></div><div>I created a minimal shape expression but the Wikidata items for the planets still lack a lot of information that can be added. First, we can think of them as venues, perhaps, where people "publish" their work. Second, we can annotate the blog planets with 'main subject' for the topics the cover. Or we can list the people that are "author" on the planet; most planets are very transparent about which blogs they aggregate.</div><div><br /></div><div>Love to see where this is going. Who knows? Maybe we will see Postgenomic (see doi:<a href="https://doi.org/10.1186/1471-2105-8-487">10.1186/1471-2105-8-487</a>) and <a href="https://chem-bla-ics.blogspot.com/search?q=%22chemical+blogspace%22">Chemical blogspace</a> resurface :)</div><div><br /></div><div><br /></div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-61647762830853720522023-07-27T11:24:00.000+02:002023-07-27T11:24:12.492+02:00Archiving and updating my blog<p>This blog is <a href="https://chem-bla-ics.blogspot.com/2005/10/chem-bla-ics.html">almost 18 years old</a> now. I have long wanted to migrate it to a version control system and at the same time have more control over things. Markdown would be awesome. In the past year, I learned a lot about the power of <a href="https://github.com/jekyll/minima">Jekyll</a> and needed to get more experienced with it to use it for more databases, like we now do for <a href="https://wikipathways.org/">WikiPathways</a>.</p><p>So, time to <a href="https://egonw.github.io/blog/">migrate</a> this blog :) This is probably a multiyear project, so feel free to continue reading it hear. Why? Because I start with the old posts :) Along the way, I am fixing things, improving it. I still have plenty on my todo list, but already happy with having learned <a href="https://fontawesome.com/">Font Awesome</a>, which makes it easy to annotate with how I fixed broken links (or not). I now use three icons: a box for when I use the Internet Archive (they can use your donation); a 'recycle' icon when I found a new URL for the same page; and a broken URL link for other situations.</p><p>This is what it looks like:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRx4DPUhTqKVqFQTXgcqHaadXoe-LvkDdmPQyAsPPLr_FOL0J_BYe0EXfPJ_TUqxWOQ0N6QOD4YXb47TGrNFq-42cAr69P5sxklM-ZwFHs9KstFC5KczyEFgpxz3sv7ExWe9i6P4dg3KV9FltZctpfi3fQ1E-94Y-mKfN3EqQrJXKCAHfozodo/s763/Screenshot_20230727_111653.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="587" data-original-width="763" height="492" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRx4DPUhTqKVqFQTXgcqHaadXoe-LvkDdmPQyAsPPLr_FOL0J_BYe0EXfPJ_TUqxWOQ0N6QOD4YXb47TGrNFq-42cAr69P5sxklM-ZwFHs9KstFC5KczyEFgpxz3sv7ExWe9i6P4dg3KV9FltZctpfi3fQ1E-94Y-mKfN3EqQrJXKCAHfozodo/w640-h492/Screenshot_20230727_111653.png" width="640" /></a></div><br /><p><br /></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com4tag:blogger.com,1999:blog-17889588.post-47277425016748303542023-07-07T20:00:00.004+02:002023-07-08T09:02:56.036+02:00Universities and open infrastructures<p style="text-align: justify;">The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is often seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of discussion and I am happy to see that the <a href="https://www.universiteitenvannederland.nl/">Dutch Universities</a> are <a href="https://chem-bla-ics.blogspot.com/2023/07/journal-rankings.html">taking back control</a> <a href="https://www.openaire.eu/next-narcis-dutch-research-portal-on-openaire">fast now</a>. For example, <a href="https://mastodon.social/@Radboud_uni">Radboud University</a> (>1k followers) already joined the Fediverse (Mastodon etc), making them independent from non-EU law and commercial interests. Scientific journals, Nobel Prize winners, etc <a href="https://chem-bla-ics.blogspot.com/2022/11/finding-mastodon-accounts-with-wikidata.html">already joined too</a>, btw.</p><p style="text-align: justify;"><a href="https://netzpolitik.org/2023/a-call-to-action-universities-of-the-world-into-the-fediverse/">This effort</a> is calling for more universities to go into the direction of open infrastructures. I am looking forward to seeing all Dutch Universities post news on Mastodon, post videos on PeerTube, etc. </p><p style="text-align: justify;">Would it not be awesome if the Fediverse would become the new multidimensional knowledge dissemination and peer review system we have all been waiting for?</p><p style="text-align: justify;"><b>Update</b>: universities with a Mastodon listed in Wikidata on the world map: <span style="text-align: left;">https://w.wiki/6zR3</span></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-86810212758559777322023-07-06T11:56:00.005+02:002023-07-06T12:00:25.224+02:00Journal Rankings<p>I am pleased to learn that the <a href="https://www.universiteitenvannederland.nl/nl_NL/nieuws-detail/nieuwsbericht/915-p-nederlandse-universiteiten-gaan-voortaan-anders-om-met-rankings-p.html">Dutch Universities start looking at rankings of a more scientific way</a>. It is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond <a href="https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt">fud</a> around the decline of quality of research.</p><p>So, what defines the quality of a journal? Or better, of any scholarly dissemination channel? After all, some databases do better peer review than some journals. Sadly, I am not aware of literature that compares the quality of peer review in databases with that in scientific journals. Also long overdue, in my opinion.</p><p>I hope the <a href="https://osc-international.com/">Open Science community</a> will help shape these scholarly dissemination channels, journals included. Some ideas, the outlet:</p><p></p><ul style="text-align: left;"><li>encourages post-publication peer review</li><li>communicates the post-publication peer review</li><li>allows updating easily small fixes and clarifications (no hiding behind the version-of-record)</li><li>ensures supp info / additional files undergo the same level of peer review</li><li>use modern solutions for communication (like semantic web technologies)</li><li>have clear licenses for all aspects of the <a href="https://chem-bla-ics.blogspot.com/2023/07/qeios-open-dissemination-platform-for.html">research output</a></li><li>actively fight against visual representation only, but provides all data</li><li>guarantees that supp info / additional files are archived, as the output itself</li><li>adopts, promotes, requires community standards (including global, unique identifiers)</li></ul><div>Okay, these items are pretty broad. Many of them are part of <a href="https://doi.org/10.1162/dint_r_00024">FAIR</a>, but that should not surprise you, because FAIR are just applying traditional scholarly approaches, like properly keeping notebooks. It's just a bit more "digital" then we have been taught.</div><div><br /></div><div>Do we know how to do this? Yes, pretty much. This is not a technical exercise, but one of social change and particularly willingness. Basically, if you want to keep the current way of doing things, the declare you want unreproducible, low quality research reporting. That's your academic freedom, of course. If I were a funder or a university, I would also expect a bit more in return for my money.</div><div><br /></div><div>Let me stress, glossy articles are fine! You do not have to stop that. Media appearances, key notes, these are all also fine. They are, however, complementary. We should not continue the habit of fancy narratives as replacement for quality research dissemination. Do both, if you must.</div><p></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-22300907140499133652023-07-02T11:02:00.001+02:002023-07-02T11:02:22.007+02:00Qeios, an open dissemination platform for research output<p>A bit over a year ago I got introduced to <a href="https://www.qeios.com/">Qeios</a> when I was asked to review an article by Michie, West, and Hasting: "<i>Creating ontological definitions for use in science</i>" (doi:<a href="https://doi.org/10.32388/YGIF9B.2">10.32388/YGIF9B.2</a>). I wrote up my thoughts after reading the paper, and the review was posted openly online and got a <a href="https://doi.org/10.32388/7MQYM4">DOI</a>. Not the first platform to do this (think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed <a href="https://www.qeios.com/read/ZJ4QDA">two</a> <a href="https://www.qeios.com/read/YCHHA7">more</a> papers.</p><p style="text-align: justify;">One of these latter two was not a more traditional paper, but a different kind of <b>research output</b>: a definition, about "<i>Drive-by Curation</i>" (doi:<a href="https://doi.org/10.32388/KBX9VO">10.32388/KBX9VO</a>). Now about this output type, collaboratively working on definitions is something core to ontology development (e.g. see doi:<a href="https://doi.org/10.1186/s13326-015-0005-5">10.1186/s13326-015-0005-5</a>), but there is a clear need to discuss terminology. The <a href="https://www.h2020gracious.eu/">GRACIOUS</a> project in the <a href="https://www.nanosafetycluster.eu/">EU NanoSafety Cluster</a> also recognized this and set up a tool for this, their <a href="https://terminology-harmonizer.greendecision.eu/">Terminology Harmonizer</a> (doi:<a href="https://doi.org/10.1016/j.impact.2021.100366">10.1016/j.impact.2021.100366</a>).</p><p style="text-align: justify;">This GRACIOUS tool, much more than what Qeios does, helps users. Unfortunately, and why how these topics nicely come together, writing definitions, thinking about when some zeta potential is different from another zeta potential, and the (drive-by) community curation, it needs transparency. I understand it, but landing on a login page is for me a recipe for a silent death as it disallows people to learn, without making an (time) investment first. That is what Qeios does differently: it is more FAIR.</p><p style="text-align: justify;">So, that brings me to my last point in this post. Jente Houweling and I wrote up a definition for "<i>Research Output Management</i>" (doi:<a href="https://doi.org/10.32388/ZNWI7T">10.32388/ZNWI7T</a>), based on our discussions about her research insights. See the screenshot below.</p><p style="text-align: justify;">It has been reviewed internally, and by one independent peer (doi:<a href="https://doi.org/10.32388/C3SJTN">10.32388/C3SJTN</a>). But we would love to hear your review too. Just follow the instructions online. We are looking forward to reading your thoughts and to refining our definition.</p><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBmgn6pFT2v9YwcOJlQuOyAeB_rvZOMSZiXbU0LpMVvt9FjqBoT3o9tXdqluKG42kR_YlD1LU25yLxsNS8LEdweTWfur88AWG-kA9-ZeNTQpgfy4gPL_os22KFHnsec0LfiBG8bUABmI4nOgOHsJc4JZpK7y1v6rl6oI3dYCrFChcvY2MP3MHF/s1200/Screenshot_20230702_110013.png" imageanchor="1"><img border="0" data-original-height="814" data-original-width="1200" height="434" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBmgn6pFT2v9YwcOJlQuOyAeB_rvZOMSZiXbU0LpMVvt9FjqBoT3o9tXdqluKG42kR_YlD1LU25yLxsNS8LEdweTWfur88AWG-kA9-ZeNTQpgfy4gPL_os22KFHnsec0LfiBG8bUABmI4nOgOHsJc4JZpK7y1v6rl6oI3dYCrFChcvY2MP3MHF/w640-h434/Screenshot_20230702_110013.png" width="640" /></a></div><br /><p style="text-align: justify;"><br /></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-36656910571552424482023-07-01T08:51:00.008+02:002023-07-02T08:49:56.688+02:00Twitter exits FAIR and is no longer a dissemination solution<p><b>Update</b>: Musk <a href="https://tweakers.net/nieuws/211364/musk-blokkeren-van-niet-ingelogde-gebruikers-op-twitter-is-tijdelijke-maatregel.html">said</a> this was a temporary measure. The problem was scraping of content, you know, the content we openly share on Twitter. Maybe they could have done this with APIs. Oh wait, they closed those behind a very expensive paywall. <b>Update 2</b>: Another rumor is that the forgot to make a deal with a cloud provider and suddenly were left with a fraction of the computing power.</p><p>And just like that, without a warning, Twitter changed policies again, and you now need a Twitter account and be logged in to see public tweets: <a href="https://www.theverge.com/2023/6/30/23779764/twitter-blocks-unregistered-users-account-tweets">Twitter has started blocking unregistered users</a> (The Verge). Though I learned it first via Mastodon, of course.</p><p>For example, this is what happens when you go to <a href="http://twitter.com/wikipathways">twitter.com/wikipathways</a>:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoQrhwd2oTxV7L_OY5INaBajfgp7SkCxGFmPDtmlod4KuUhFCywWIh4HaNT5LU68ONRcw2taqUgRcqpKY9ytfmUpwvsd3NP1hLcm7fsYU9-GrG11jmH0j-XoKzZ89aFAIoMqbQnbRV3Z2D9Z1JjwNa5HKyXDj1mtevBq1cAODSjJiDIa5JaFbT/s680/Screenshot_20230701_084547.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="680" data-original-width="632" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhoQrhwd2oTxV7L_OY5INaBajfgp7SkCxGFmPDtmlod4KuUhFCywWIh4HaNT5LU68ONRcw2taqUgRcqpKY9ytfmUpwvsd3NP1hLcm7fsYU9-GrG11jmH0j-XoKzZ89aFAIoMqbQnbRV3Z2D9Z1JjwNa5HKyXDj1mtevBq1cAODSjJiDIa5JaFbT/s320/Screenshot_20230701_084547.png" width="297" /></a></div><br /><p>Fortunately, <a href="https://wikipathways.org/">WikiPathways</a> does have a <a href="https://fosstodon.org/@wikipathways">Mastodon account</a>, that anyone can see without having a Mastodon account. You can even follow WikiPathways's account with <a href="https://fosstodon.org/@wikipathways.rss">its RSS feed</a>. Dissemination should not be paywalled.</p><p>Maybe Musk has been talking to Elsevier and Springer Nature.</p><p>Tip: <a href="https://chem-bla-ics.blogspot.com/2022/11/finding-mastodon-accounts-with-wikidata.html">Finding Mastodon accounts with Wikidata (a few SPARQL queries)</a></p><p></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-21223625269361480552023-06-11T21:04:00.005+02:002023-06-11T21:04:46.513+02:00Community activity #2: FAIRsharing<p>Some years ago we started the <a href="https://elixir-europe.org/communities/toxicology">ELIXIR Toxicology Community</a>. It has been an interesting journey, partly covered in <a href="https://f1000research.com/articles/10-1129/v1">this whitepaper</a>). We started with interaction we had in several projects already, but particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions ("<a href="https://elixir-europe.org/services">services</a>"). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please join!).</p><p><b>History</b></p><p>In this post, let's look at <a href="https://fairsharing.org/">FAIRsharing</a>. It is "A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies" [0,1].</p><p>The ELIXIR Toxicology Community (we) maintains the toxicology corner of this database and members of our community have been adding toxicology-related databases, relevant standards. On the side of the policies we are falling a bit short: <a href="https://fairsharing.org/Toxicology">fairsharing.org/Toxicology</a>. </p><p><b>Why adopt FAIRsharing</b></p><p>FAIRsharing is one place where metadata can be shared about your databases. It helps make your resources and research more FAIR and explains people how your work relates to other work (<a href="https://fairsharing.org/graph/3496">fairsharing.org/graph/3496</a>):</p><p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFfnBKCL0cun978nE8VbMOh-v2Szc6itY5IUL0PUevl6qaPcZJNlZ48o0hNRurIMTLTh4-jgvmLazZGGWIZgPG3vf_nEd-qG3WN0GRVfhbGlql6Qt_LokiEpcgrwU8dgz1dSrWkOD8mggkomuoy0rS3puyQsTXJeLGzTQoCCEQpLG3s5KI4A/s1427/Screenshot_20230611_210034.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img alt="Screenshot of the 'collects" graph of the FAIRsharing Toxicology Community." border="0" data-original-height="1028" data-original-width="1427" height="462" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiFfnBKCL0cun978nE8VbMOh-v2Szc6itY5IUL0PUevl6qaPcZJNlZ48o0hNRurIMTLTh4-jgvmLazZGGWIZgPG3vf_nEd-qG3WN0GRVfhbGlql6Qt_LokiEpcgrwU8dgz1dSrWkOD8mggkomuoy0rS3puyQsTXJeLGzTQoCCEQpLG3s5KI4A/w640-h462/Screenshot_20230611_210034.png" width="640" /></a></p><p><b>What you can do</b></p><p>Get an account (with your ORCID or GitHub account) and add resources important to your research, your projects, your work generally. Particularly, (data) policies and standards you are expected to comply with are useful. Also, links between various resources. For example, if some (project) database complies with an important policy or standards, this is worth seeing show up.</p><p>Alternatively, join the ELIXIR Toxicology Community <a href="https://doi.org/10.1162/dint_r_00024">mailing list</a> and post the missing resource there, or use our issue tracker at <a href="https://github.com/elixir-europe/toxicology-community/issues/">github.com/elixir-europe/toxicology-community/issues/</a>.</p><p>Let's make toxicology more <a href="https://doi.org/10.1162/dint_r_00024">FAIR</a>.</p><p>0.<a href="https://www.nature.com/articles/s41587-019-0080-8">https://www.nature.com/articles/s41587-019-0080-8</a></p><p>1.<a href="https://scholia.toolforge.org/work/Q64084285">https://scholia.toolforge.org/work/Q64084285</a></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-23161585258779609452023-05-31T07:53:00.001+02:002023-05-31T07:53:35.554+02:00Information Retrieval versus ChatGPT<p>When last week in a large (and relevant) Dutch research event ChatGPT came up, and that this was going to change the world. Even the critiques came up, but were effectively disregarded with "these methods get better very quickly". This is not untrue, but not really true either. I murmur "not even wrong". I know how hard it is to get computers to find meaningful patters; I did a PhD in this in the early 21st century.</p><p>What strikes me, is that ChatGPT is now pitches as an informational retrieval (IR) system. This is a system where it tries to find information, that is, it "retrieves" information form a knowledge base. Like SQL or SPARQL. Or like Google Maps. IR about reproducing existing knowledge.</p><p>Now, deep learning starts with a different premise: we can find the patterns and in this way compress an unlimited number of facts into a mathematical equation, a physical law. That way, you do not have to record if the sun comes up every day. We predict it does. We do not have to record that rain drop will fall (that they do. when they do that actually is something to record). At best, we would record when rain drops start "falling" to the sky. That is, we have the laws of gravitation.</p><p>But here lies the problem with systems like ChatGPT: they are as good as their predictive patterns they learned. But they do not retrieve information. They predict information. This is why it doesn't know about references. It lost the link between predictions and on which shelf the the book was stored.</p><p>So, when last week the research event mentioned that lawyers were starting to use it, citing existing work, I was skeptical: that would actually mean they moved ChatGPT into IR. And I already had learned (*) that ChatGPT would predict references, rather than look them up. It's a prediction method, not an IR method. So, how come it would accurately give citations to court cases.</p><p>It didn't. It's all over the news now. If "hallucinated" legal citations.</p><p>Does this matter? I think it does. This is why I moved my research focus after my PhD back to IR, away from the machine learning. Deep learning can only generalize the facts, so we better start accurately recording facts. This is why I study interoperable and reusable knowledge bases, like WikiPathways, Wikidata, technologies like RDF in science, etc. Actually, this realization predates my machine learning. I guess I already had this notion when I started the <i>Woordenboek Organische Chemie</i> back in the nineties.</p><p>Someone has to. I just hope the funding for this fundamental aspect of research doesn't run out. Information Retrieval will remain essential to science for a few decades more.</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-80078539636017716992023-05-22T07:23:00.000+02:002023-05-22T07:23:10.123+02:00Paper: "The FAIR Cookbook - the essential resource for and by FAIR doers"<p>I think that if you want to make your knowledge FAIR, you should use an open license and RDF. Simple. Now, not everything is knowledge. A lot of data is, but a lot more is not, think raw data. Using RDF to explain a protein sequence is still something that makes me feel uneasy.</p><p>However, first, you need to make RDF, you need to make assumptions explicit, you need to decide on meaning. Making RDF is not easy. It's not hard, just a lot of administration and scientific thinking. What did I measure? What model do I use to describe the chemistry? You know, my research job.</p><p>Moreover, not only data should be FAIR. All research output (worth communicating) should be FAIR.</p><p>In the past, Andra Waagmeester invited me to co-author a recipe that explains <a href="http://www.openphacts.org/specs/2013/WD-rdfguide-20131007/">the general steps of creating RDF</a>. This was during the Open PHACTS project and with Carina Haupt. Writing recipes is something getting traction. They are a bit like <a href="https://r-pkgs.org/vignettes.html">vignettes from the R world</a>.</p><p>In the past few years the <a href="https://cordis.europa.eu/project/id/802750">FAIRplus project</a> created a <a href="https://faircookbook.elixir-europe.org/">FAIR Cookbook</a> with recipes and I wrote a few. Actually, I still have a few to finish, for which I cannot find the time. I retrospect, I spent too much time on perfecting the recipe to finish them earlier. The FAIR Cookbook is now a professional venue with editorial board. It is fully open source and welcomes your recipes. Oh, and it is now hosted as ELIXIR service, which is great to see!</p><p>Finally, the <a href="https://doi.org/10.1038/s41597-023-02166-3">The FAIR Cookbook - the essential resource for and by FAIR doers paper</a> is out. Go read it :)</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="https://www.nature.com/articles/s41597-023-02166-3" target="_blank"><img border="0" data-original-height="681" data-original-width="1898" height="230" src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-023-02166-3/MediaObjects/41597_2023_2166_Fig2_HTML.png?as=webp" width="640" /></a></span></td></tr><tr><td class="tr-caption" style="text-align: center;"><a href="https://www.nature.com/articles/s41597-023-02166-3" target="_blank">Figure 2 from the article: "Citability of recipes and identification of<br />and credit for authors; an example is provided."</a><div class="c-article-figure-button-container hide-print" style="background-color: white; box-sizing: inherit; color: #222222; font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen-Sans, Ubuntu, Cantarell, "Helvetica Neue", sans-serif; font-size: 18px; text-align: right;"></div></td></tr></tbody></table><p><br /></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-28757139001971809062023-05-12T21:56:00.004+02:002023-05-12T21:56:25.780+02:00Paper: "Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations"<p style="text-align: justify;">When I joined the BiGCaT research group in 2012 I was particularly interested in the open science approach of WikiPathways. As a chemist by training and researcher in cheminformatics, metabolites and their metabolic reactions took my particular interest. I am happy that I have been able to fund Denise's research project. And thanks Denise for this very exciting research. I know it's just a first step, and far more translational steps are needed, but I for one am very exciting to bridge molecular info to clinical outcomes.</p><p style="text-align: justify;">In this study, Denise explored how we can take advantage from molecular pathway databases to link biomarker information: <i>"Our framework integrates literature and expert knowledge into machine-readable pathway models, including relevant urine biomarkers and their interactions. The clinical data of 16 previously diagnosed patients with various pyrimidine and urea cycle disorders were visualized on the top 3 relevant pathways. Two expert laboratory scientists evaluated the resulting visualizations to derive a diagnosis" </i>(doi:<a href="https://doi.org/10.1186/s13023-023-02683-9">10.1186/s13023-023-02683-9</a>).</p><p style="text-align: justify;">Figure 4 shows how such a visualization of those biomarker interactions can look like:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn6FiGiT9KxpnRbhN-NDSbBDdo8tw7URnHGo1p3liOXQlf4BZLxE20qXGrC3HrLebgM1qDzf08i5rqXaaeahpVda3hNYgLDuzt-cg9gCBlspI3Yfa9aeSmDlRigHUb1--FBH8yNOpzYcu_hNR7d9zrbxso2ulJU13WXyuxU8f5v_ul3ZjgSw/s1960/13023_2023_2683_Fig4_HTML.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1272" data-original-width="1960" height="416" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn6FiGiT9KxpnRbhN-NDSbBDdo8tw7URnHGo1p3liOXQlf4BZLxE20qXGrC3HrLebgM1qDzf08i5rqXaaeahpVda3hNYgLDuzt-cg9gCBlspI3Yfa9aeSmDlRigHUb1--FBH8yNOpzYcu_hNR7d9zrbxso2ulJU13WXyuxU8f5v_ul3ZjgSw/w640-h416/13023_2023_2683_Fig4_HTML.png" width="640" /></a></div><p style="text-align: justify;">And I am hugely proud of the open science approach, <a href="https://bigcat-um.github.io/IMD-PUPY/">from GitHub repo</a>, open source R code, SPARQL queries. Thank you, Denise! And thanks to <a href="https://cris.maastrichtuniversity.nl/en/persons/laura-steinbusch">Dr Laura Steinbusch</a> for this nice collaboration! Further acks in the article.</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-40026703514786439102023-05-12T21:34:00.002+02:002023-05-12T21:57:08.479+02:00Community activity #1: Bioschemas annotation<div style="text-align: justify;"><span style="font-family: inherit;">Some years ago we started the <a href="https://elixir-europe.org/communities/toxicology">ELIXIR Toxicology Community</a>. It has been an interesting journey, partly covered in <a href="https://f1000research.com/articles/10-1129/v1">this whitepaper</a>). We started with interaction we had in several projects already, but particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions ("<a href="https://elixir-europe.org/services">services</a>"). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please <a href="https://signup.aai.lifescience-ri.eu/registrar/?vo=elixir&group=Community%3AUser+Communities%3AToxicology&targetexisting=https://www.elixir-europe.org/alreadyregistered&targetnew=https://www.elixir-europe.org/registration-successful">join</a>!). </span></div><div style="text-align: justify;"><span style="font-family: inherit;"><br /></span></div><div style="text-align: justify;"><span style="font-family: inherit;"><b>History</b></span></div><div style="text-align: justify;"><span style="font-family: inherit;"><br /></span></div><div style="text-align: left;"><span><div style="font-family: inherit; text-align: justify;">In this post, let's look at <a href="https://bioschemas.org/">Bioschemas</a> annotation. Our community has been collaborating with the Bioschemas project for a long time. The development of the ChemicalSubstance profile was started by our community at one of the earlier <a href="https://biohackathon-europe.org/">ELIXIR BioHackathon Europe</a> events. <span style="font-family: inherit;">The "<b>ChemicalSubstance</b>" (</span><a href="https://bioschemas.org/profiles/ChemicalSubstance/0.4-RELEASE" style="font-family: inherit;">ChemicalSubstance/0.4-RELEASE</a><span style="font-family: inherit;">) is the material equivalent of "<b>MolecularEntity</b>" that was already being developed. ChemicalSubstance is used in various places, see <a href="https://bioschemas.org/developer/liveDeploys#nav-profile">bioschemas.org/developer/liveDeploys</a> </span><span style="font-family: inherit;">and select the ChemicalSubstance profile.</span></div><div style="font-family: inherit; text-align: justify;"><span style="font-family: inherit;"><br /></span></div><div style="text-align: justify;">We have also been using Bioschemas annotation of training material, see <a href="https://www.dtls.nl/2018/07/19/toxicology-data-management-tutorials-automatically-collected-by-european-training-portal-tess/">https://www.dtls.nl/2018/07/19/toxicology-data-management-tutorials</a> This is still being used in some of the projects I am involved in.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>Third, multiple NanoSafety Clusters are very actively using "Dataset" annotation to make them findable in Google's Dataset Search. For example, notice the mention of <a href="http://nanocommons.github.io">nanocommons.github.io</a> when searching for "toxicity nanomaterial": <a href="https://datasetsearch.research.google.com/search?query=toxicology%20nanomaterial">datasetsearch.research.google.com/search?query=toxicology%20nanomaterial</a> That shows up because of the Bioschemas annotation on this NanoCommons overview of citable (i.e. with DOI) and openly licensed datasets: <a href="https://nanocommons.github.io/datasets/">nanocommons.github.io/datasets/</a> We're also working with data platform developers in the NanoSafety Cluster to use such annotation, supporting the F in FAIR.</div><div><br /></div><div><b>Why adopt Bioschemas</b></div><div><br /></div><div>Bioschemas is a life sciences-oriented extension of <a href="http://schema.org">schema.org</a>, a platform originally set up by the major search engines, as clear from the Google Dataset Search engine.</div><div><br /></div><div><b>What can you do</b></div><div><br /></div><div><div>Joining this effort helps make the research output of your toxicology project more FAIR. Bioschemas (and <a href="http://schema.org">schema.org</a>) annotation can be added to any HTML page. The Bioschemas project can assist. We want to organize workshops this year to work with toxicology projects for wider adoption. But please feel encouraged to ask questions on this mailing list prior to such activities. One thing I am particularly interested in, is people interested in setting up web sites with information and data about specific chemicals or nanomaterials.</div><div><br /></div><div>Please also feel encouraged to reply if you already used <a href="http://schema.org">schema.org</a> or Bioschemas in any of your projects to share your experiences.</div><div><br /></div><div>Let's make toxicology more <a href="https://doi.org/10.1162/dint_r_00024">FAIR</a>.</div></div></div></span></div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-75724744910819763022023-04-02T09:11:00.006+02:002023-04-02T11:38:35.780+02:00CiTO updates #4: annotations in datasets<p style="text-align: justify;">Okay, <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00683-2">the Pilot</a> <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00684-1">is over</a> ending with 17 papers, 16 of which have CiTO annotations (and so far 4 J.Cheminform. <a href="https://doi.org/10.1186/s13321-022-00656-x">papers</a> <a href="https://doi.org/10.1186/s13321-022-00673-w">after</a> <a href="https://doi.org/10.1186/s13321-022-00677-6">the</a> <a href="https://doi.org/10.1186/s13321-023-00701-3">pilot</a>), but my interest in the <a href="http://purl.org/spar/cito">Citation Typing Ontology</a> continues and we just need <a href="https://chem-bla-ics.blogspot.com/2023/02/citation-typing-progress-but-we-need.html">more adoption</a>.</p><p style="text-align: justify;"><b>Datasets as source of annotations</b></p><p style="text-align: justify;">So, here's a quick <a href="https://wikidata.org/">Wikidata</a> update. I have been using Wikidata as infrastructure to collect and share CiTO annotations (see also the below "Scholia patch" posts). Some time ago I recovered my CiteULike CiTO annotations and made this <a href="https://scholia.toolforge.org/work/Q115470140">available on Zenodo</a> (doi:<a href="https://doi.org/10.5281/ZENODO.7368209">10.5281/zenodo.7368209</a>). </p><p style="text-align: justify;">And while thinking about datasets with CiTO annotations, I found two other datasets. One was from an article in Portuguese and one from an <a href="https://scholia.toolforge.org/work/Q117369886">article by Peroni et al.</a> with <a href="https://zenodo.org/record/6885109">this data file</a>. That data file is actually a zip, but inside the zip file is a CSV file with three interesting columns: <i>cited_doi</i>, <i>citing_doi</i>, and <i>intext_citation.intent</i>. There are many more columns and I can highly recommend browsing them. But these are the three I need to add data to Wikidata. The third column is free text, but using the CiTO for labels, making it relatively easy to convert to <a href="https://w.wiki/62sR">citation intentions from Wikidata</a> (PS, thanks to <a href="https://www.wikidata.org/wiki/User:Fvtvr3r">Fvtvr3r</a> for adding more!).</p><p style="text-align: justify;">So, I had a cleaned file and started writing a Groovy Bioclipse script using <a href="https://doi.org/10.21105/joss.02558">Bacting</a>. It basically does a few things: extract all DOIs, check which ones are in Wikidata, analyze the <i>intext_citation.intent</i> column content, and then generate QuickStatements (see <a href="https://gist.github.com/egonw/f74fd3bc1f6361434b042a4cac2a8089">this gist</a>). Out of the 600 lines from the input, it creates some 200 new CiTO-annotated citations in Wikidata between <a href="https://scholia.toolforge.org/work/Q117357537#statements">some 150 article pairs</a>:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirC42FpCu_XI5r7mpiaZmN7T4runwuxbHIEC5oGf9MXu2V88IA8QmNk32vD4YnmQCiZJmnJMkbJRnHlgszx31xdYvt-BleH77wWNWKY3OYHNCQitsVl_TI_oSvUFntKMfixJ2ZlITRjMXiHGS2tzLDicWbN_CnxbCuqEd3lO17ocIWwoILUg/s1146/Screenshot_20230402_084711.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="725" data-original-width="1146" height="404" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirC42FpCu_XI5r7mpiaZmN7T4runwuxbHIEC5oGf9MXu2V88IA8QmNk32vD4YnmQCiZJmnJMkbJRnHlgszx31xdYvt-BleH77wWNWKY3OYHNCQitsVl_TI_oSvUFntKMfixJ2ZlITRjMXiHGS2tzLDicWbN_CnxbCuqEd3lO17ocIWwoILUg/w640-h404/Screenshot_20230402_084711.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;">The ability to include CiTO annotations from datasets is another welcome boost for the CiTO statistics in Wikidata. <a href="https://w.wiki/6XQf">This SPARQL query</a> shows an overview of sources that support the CiTO intention annotation, but note that a claim with a CiTO intention may also have CrossRef, PubMed, and COCI as reference. In those cases, they are primarily for the citations and not the intention.</div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;">There are <a href="https://scholar.social/@egonw/110124747053293502">now</a> (the <a href="https://scholia.toolforge.org/cito/#statistics">latest stats are here</a>) <b>1202 citation intention</b> annotations in Wikidata for 992 citations from <b>405 articles in 199 venues</b>. Of these 27 articles have explicit annotations in the article itself and are found in 4 venues, two journals and two preprint servers). These annotated citations are to 510 articles in 190 different venues. <a href="https://github.com/WDscholia/scholia/pull/2271">This Scholia patch</a> will add a new statistics, the number of datasets providing citation intentions, of which there are (as discussed) <a href="https://scholia.toolforge.org/topic/Q115470140">currently</a> <a href="https://scholia.toolforge.org/work/Q117357537">two</a> in Wikidata. The latter two provide intentions for the majority of articles and are depicted in yellow in the below overview. </div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrSL2g52_H8x-d4sLLwka-A_lHMDUBnpMXKWVrOgmF2vf7US_v779_r-EBDJpghAuhrflN_R6faLa77gaxJTugW7ReXnyMwiZtU8v1cZebNs9YlNgui3Qg1OsUYTRMlcVwAk9pjZum7zQHeUKs-7--DgUTFskXfyldl8hWe9A9mkHXzVOJrQ/s1105/Screenshot_20230402_085317.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="591" data-original-width="1105" height="342" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrSL2g52_H8x-d4sLLwka-A_lHMDUBnpMXKWVrOgmF2vf7US_v779_r-EBDJpghAuhrflN_R6faLa77gaxJTugW7ReXnyMwiZtU8v1cZebNs9YlNgui3Qg1OsUYTRMlcVwAk9pjZum7zQHeUKs-7--DgUTFskXfyldl8hWe9A9mkHXzVOJrQ/w640-h342/Screenshot_20230402_085317.png" width="640" /></a></div><br /><div class="separator" style="clear: both; text-align: left;"><br /></div>With an annotation in <a href="https://www.wikidata.org/wiki/Q27638524">an 1938 article by Alan Turing</a>! I ran into this article in November 2011 noting an apparent duplicate title in his article list. I turned out an earlier article had a correction with the same name. I added <a href="https://www.wikidata.org/w/index.php?title=Q27638524&diff=1527020358&oldid=984628387&diffmode=source">this clarification</a>:<div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVOJNQgyM3VU5fsjykZEWsxukRXdpgiy_dly6nihXZv4Tql3a5n3xLZqtQxyI2OCHPnVKtDdWmWwDnMFRzh7p6ZrdAHh7bCyM-PSTL8gxmSLHCVVzYSjGnANcQCd394vZqkQ6nudO5J3_VV0BJ8cKaxP-cs4XWgxbhvgc0DSEuMuaKKP9qOw/s946/Screenshot_20230402_090600.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="195" data-original-width="946" height="132" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhVOJNQgyM3VU5fsjykZEWsxukRXdpgiy_dly6nihXZv4Tql3a5n3xLZqtQxyI2OCHPnVKtDdWmWwDnMFRzh7p6ZrdAHh7bCyM-PSTL8gxmSLHCVVzYSjGnANcQCd394vZqkQ6nudO5J3_VV0BJ8cKaxP-cs4XWgxbhvgc0DSEuMuaKKP9qOw/w640-h132/Screenshot_20230402_090600.png" width="640" /></a></div><br /><div>This is very trivial citation intention data that publishers could provide as open data.</div><div><br /></div><div>Okay, that will do for today. There are actually some really interesting things in the pipeline, but I will have to write about that later. I have some deadlines I should start looking at. Below is some extra reading.</div><div><p><b>Some more history</b></p><p></p><ul style="text-align: left;"><li>2021: <a href="https://chem-bla-ics.blogspot.com/2021/11/biohackathon-europe-2021-1-cito.html">BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv</a></li><li>2021: <a href="https://chem-bla-ics.blogspot.com/2021/03/markdown-template-for-journal-of.html">Markdown template for the Journal of Cheminformatics with CiTO support</a></li><li>2020: <a href="https://chem-bla-ics.blogspot.com/2020/11/cito-updates-3-third-paper-in.html">CiTO updates #3: third paper in the collection and updated Scholia patch</a> </li><li>2020: <a href="https://chem-bla-ics.blogspot.com/2020/11/cito-updates-2-annotation-migration-to.html">CiTO updates #2: annotation migration to Wikidata and first Scholia patch</a></li><li>2020: <a href="https://chem-bla-ics.blogspot.com/2020/11/cito-updates-1-first-research-paper-in.html">CiTO updates #1: first research paper in the Journal of Cheminformatics with CiTO annotation published</a></li><li>July 2020: <a href="https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html">New Editorial: "Adoption of the Citation Typing Ontology by the Journal of Cheminformatics"</a></li><li>2015: <a href="https://chem-bla-ics.blogspot.com/2015/03/what-youre-doing-is-rather-desperate.html">"What You're Doing Is Rather Desperate"</a></li><li>2012: <a href="https://chem-bla-ics.blogspot.com/2012/02/cito-citeulike-publishing-innovation.html">CiTO / CiteULike: publishing innovation</a></li><li>2010: <a href="https://chem-bla-ics.blogspot.com/2010/10/citeulike-cito-use-case-1-wordles.html">CiteULike CiTO Use Case #1: Wordles</a></li><li>September 2010: <a href="https://chem-bla-ics.blogspot.com/2010/09/list-of-things-i-miss-in-citeulike.html">A list of things I miss in CiteULike</a></li></ul><p></p></div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-12488839020780319202023-03-12T10:56:00.002+01:002023-03-12T10:58:23.103+01:00BridgeDb NWO grant update #7: wrapping up the project<p>I have received the request to write up the final reporting and the paid practical work has been completed (we already said goodbye to Helena almost a month ago). After the <a href="https://chem-bla-ics.blogspot.com/2023/02/bridgedb-nwo-grant-update-6-second.html">hackathon last month</a>, we released <a href="https://doi.org/10.5281/zenodo.7678825">BridgeDb Webservice 2.1.0</a> and actually had this online for about a week.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgryYeP-pkBpB1V8lPJYQVoT8yXs1iXLSLogt9fdBzCzYYm-9PCvIOXVqH2u5Q97FNVoF29zFS2DUB8PiOSx3xP82MJnxenLrmm7U82lpJbDBst37CX4aH2MlzDaeMSLIa6-WxxVKOItZQsEMdpH8i6Zm3GZsQsTXxMlbIY6O69fxOmMNBHUg/s1132/Screenshot_20230312_103018.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="574" data-original-width="1132" height="324" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgryYeP-pkBpB1V8lPJYQVoT8yXs1iXLSLogt9fdBzCzYYm-9PCvIOXVqH2u5Q97FNVoF29zFS2DUB8PiOSx3xP82MJnxenLrmm7U82lpJbDBst37CX4aH2MlzDaeMSLIa6-WxxVKOItZQsEMdpH8i6Zm3GZsQsTXxMlbIY6O69fxOmMNBHUg/w640-h324/Screenshot_20230312_103018.png" width="640" /></a></div><br /><p>Unfortunately, this week we ran into a few regressions and I restored the previous version, solving those issues. <a href="https://github.com/bridgedb/BridgeDbWebservice/issues/8">Issues</a> <a href="https://github.com/bridgedb/BridgeDbWebservice/issues/9">were</a> created and solved this week(-end), resulting in <a href="https://doi.org/10.5281/zenodo.7722990">the 2.1.1 release</a>. </p><p>Along the fixing, tests were created for the problems along with tests for other API methods. This was interesting in itself, because it requires firing up a BridgeDb Webservice in the background and actually load a Derby file (we need some data to test for). Firing up the webservice is one thing (I'm just hoping the port is open when the test runs), but we also need to create two temporary files. One is the gdb.config which points to the Derby file and the Derby file itself. But both are distributed in java archive files (JARs) so need to be saved to a temporary file first. That was <a href="https://github.com/bridgedb/BridgeDbWebservice/blob/main/src/test/java/org/bridgedb/webservicetesting/BridgeDbWebservice/RestletServerTest.java#L40-L63">doable</a> :)</p>
<pre> public static void startServer() throws IOException {
// set up a test Derby file
File derbyFile = File.createTempFile("bdb", "bridge");
derbyFile.deleteOnExit();
InputStream stream = RestletServerTest.class.getClassLoader().getResourceAsStream("humancorona-2021-11-27.bridge");
FileOutputStream derbyStream = new FileOutputStream(derbyFile);
stream.transferTo(derbyStream);
derbyStream.close();
stream.close();
// set up the GDB config file
File configFile = File.createTempFile("gdb", "config");
configFile.deleteOnExit();
FileOutputStream outputStream = new FileOutputStream(configFile);
BufferedOutputStream bufferStream = new BufferedOutputStream(outputStream);
String configFileContent = "*\t" + derbyFile.getAbsolutePath();
bufferStream.write(configFileContent.getBytes());
bufferStream.close();
outputStream.close();
// set up the REST service
RestletServerTest.server = new RestletServer();
RestletServerTest.server.run(port, configFile, false, false);
}
</pre>During the grant, one task was to set up better testing. We did, but for the new webservice, this was not put in place yet. Particularly, the code coverage of the testing was not set up. That I did this week using the CodeCov services which are free for open source projects. That gives <a href="https://app.codecov.io/gh/bridgedb/BridgeDbWebservice">these results</a>:<div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii8RuDa2n7YuyCNoE7Mb5FKzShO5ScJfiF1s6h8qVAu1bjcrJcQOcffZvwDTG_HASwGtYD1MCnyidDgXyqXK2-JNzdUs2YCSrOw2Xwz3sLnhqxb_SUvMGZQv6o-Y-pc3IsA4KwZZnkuSh_OiNLW_DDpvk-c2IdYFTsK8uCmSVbWTtnefD5Mg/s1308/Screenshot_20230312_105151.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="874" data-original-width="1308" height="428" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii8RuDa2n7YuyCNoE7Mb5FKzShO5ScJfiF1s6h8qVAu1bjcrJcQOcffZvwDTG_HASwGtYD1MCnyidDgXyqXK2-JNzdUs2YCSrOw2Xwz3sLnhqxb_SUvMGZQv6o-Y-pc3IsA4KwZZnkuSh_OiNLW_DDpvk-c2IdYFTsK8uCmSVbWTtnefD5Mg/w640-h428/Screenshot_20230312_105151.png" width="640" /></a></div><br /><div>There clearly is work left to be done. The current testing focuses on the common functionality and the new (alpha) JSON functionality is mostly not tested yet. This will change in the next few weeks.</div><div><br /></div><div>So, that leaves the reporting and writing the journal article. And cleaning up the lab, of course.</div>
<p><b>Previous updates</b></p><p></p><ul style="text-align: left;"><li><a href="https://chem-bla-ics.blogspot.com/2023/02/bridgedb-nwo-grant-update-6-second.html">BridgeDb NWO grant update #6: second hackathon</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/12/bridgedb-nwo-grant-update-5.html">BridgeDb NWO grant update #5: BioHackathon, Webservice, Bioregistry</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/07/bridgedb-nwo-grant-update-4.html">BridgeDb NWO grant update #4: Bioregistry.io and a BioSB workshop</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/05/bridgedb-nwo-grant-update-3-pandoras-box.html">BridgeDb NWO grant update #3: Pandora's box</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/04/bridgedb-nwo-grant-update-2-building-up.html">BridgeDb NWO grant update #2: building up momentum</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/03/bridgedb-nwo-grant-update-1-first-steps.html">BridgeDb NWO grant update #1: first steps</a></li></ul><p></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-78930893537667962322023-03-12T09:19:00.003+01:002023-03-12T09:19:31.355+01:00Paper: "PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity"<p style="text-align: justify;"><a href="https://scholar.google.com/citations?user=8ZmXyZcAAAAJ&hl=en&oi=ao">Ammar Ammar</a> in my group just published the second half of his cheminformatics study into what happens with binding affinities when the proteins show amino acid changes, selected based on world-wide population statistics. His idea what that drugs should be designed to not be selective for a particular genotype. The first paper (see <a href="https://chem-bla-ics.blogspot.com/2022/05/new-psnpbind-database-of-mutated.html">this post</a>) tells the story about how to automate running thousands of docking experiments and explains how to put this knowledge base online, while the paper this month explains how machine learning can learn the patterns found in those docking experiments:</p><p></p><ul style="text-align: left;"><li style="text-align: justify;">2022: <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00573-5">PSnpBind: a database of mutated binding site protein–ligand complexes constructed using a multithreaded virtual screening workflow</a></li><li style="text-align: justify;">2023: <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00701-3">PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity</a></li></ul><div style="text-align: justify;">The idea in the PSnpBind-ML paper is simple. We can calculate binding affinities for many ligand-protein complexes. If we calculate enough, one can create a QSAR model that includes ligand and protein information (to capture the SNP uniqueness) to predict that affinity with the QSAR model. That will be a lot more scalable. The 2023 article has the full details, and everything is open.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">One thing that fascinated me when Ammar proposed this study is the notion that in this way, for each ligand, we can see how stable the binding affinity is over the various protein variants. Are there proteins which are harder to target because of the variants? Do certain classes of chemical structures show a lot of binding differences over the world-wide protein diversity?</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Also, I was interested in if it would work in the first place. I remember from my own PhD days (some 20 years ago now) that docking experiments had a fairly high prediction error. So, when I see this plot on the independent test set, I am intrigued: </div><div style="text-align: justify;"><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjedhSZq9e8OV7XYMETS5mnLX7k6I5fg5-0R25iBHOPVw9neHynFSyvMJm0wKHEr6MrzVgi-ZNkFjo5YokE9O1ViOV_kuk3TJtC1E9T8o_LTDWZEJffRBGwJWe8ZfUiWGfVA-HMQJoTrUMHNuKjcjPfVNSHLyDXLbxQPTDS8i5O7gZ8gUtGzw/s1770/13321_2023_701_Fig6_HTML.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="Figure 6: "Test set observed versus predicted binding affinities to mutated proteins using two trained random forest models (one using measured wild-type protein-ligand binding affinity and the second using predicted wild-type protein-ligand binding affinity as input). A The model trained with measured wild-type binding affinity and tested using measured wild-type binding affinity. B The model trained with measured wild-type binding affinity and tested using predicted wild-type binding affinity. C The model trained with predicted wild-type binding affinity and tested using measured wild-type binding affinity. D The model trained with predicted wild-type binding affinity and tested using predicted wild-type binding affinity"" border="0" data-original-height="1324" data-original-width="1770" height="299" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjedhSZq9e8OV7XYMETS5mnLX7k6I5fg5-0R25iBHOPVw9neHynFSyvMJm0wKHEr6MrzVgi-ZNkFjo5YokE9O1ViOV_kuk3TJtC1E9T8o_LTDWZEJffRBGwJWe8ZfUiWGfVA-HMQJoTrUMHNuKjcjPfVNSHLyDXLbxQPTDS8i5O7gZ8gUtGzw/w400-h299/13321_2023_701_Fig6_HTML.png" title="Test set observed versus predicted binding affinities to mutated proteins using two trained random forest models (one using measured wild-type protein-ligand binding affinity and the second using predicted wild-type protein-ligand binding affinity as input)." width="400" /></a></div><br /><br /></div><div style="text-align: justify;">So, what about the binding affinity variation? Ammar did not put this figure in the paper, but sent me a copy to put online here.</div><div style="text-align: justify;"><br /></div><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUMHPvGM099mmlPsJqc1d2HQlDMNE9dUjXOcno1JcWRwfNU-86ncHs3JyZi3FNHMdRAjg1YA6E8p9S0Id3dQaePhnsauqh1yyhnC-YfKG6MFqpk8TKdYVYakgNd2n1L8NI4obxFaoBAZvso9b9PEW_7jyNbtaWC2wra98-JJs7aCe67JMzrA/s840/snp-effect.png" imageanchor="1"><img border="0" data-original-height="840" data-original-width="840" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUMHPvGM099mmlPsJqc1d2HQlDMNE9dUjXOcno1JcWRwfNU-86ncHs3JyZi3FNHMdRAjg1YA6E8p9S0Id3dQaePhnsauqh1yyhnC-YfKG6MFqpk8TKdYVYakgNd2n1L8NI4obxFaoBAZvso9b9PEW_7jyNbtaWC2wra98-JJs7aCe67JMzrA/w400-h400/snp-effect.png" width="400" /></a></div><br /><div style="text-align: justify;">We here see a boxplot (yeah, there are better alternatives, I know...) showing quite a bit of variation in the binding affinities for the various variants of human Pim-1 kinase (with crystal structures <a href=" https://w3id.org/psnpbind/protein/2c3i">2C3I</a>, 3BGZ, etc). These plots show that the variation is mostly high, but sometimes quite small indeed. I don't really see a pattern here.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">And totally in line with open science, each combination of docked ligand and mutated protein can be looked at online with <a href="https://jmol.sourceforge.net/">Jmol</a>, e.g. <a href="https://psnpbind.org/variant/169/ligand/CHEMBL101665">this one</a>:</div><div style="text-align: justify;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip0KjsQkg1BF8H9Wo8GE6fRJy4D_hWKLk7bM_Z4TCe9H4IWM4EKlGSoljSakB4oCiYQkGxrIhBWEBpyoO55vHWW9h4rHAmU7yklU80zfb0mxQjjoHvSIwAFLf4kpWq1M_muMOvYr8AKzTX-SJyPOV9oTYknH2ThGVrme4LmdBQDeLib5Ad5g/s1263/Screenshot_20230312_085210.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img alt="Screenshot of the PSnpBind database website with a ligand-protein binding for a variant of one of the proteins in the data set. It used Jmol to visualize the location of the amino acid change and the bound ligand." border="0" data-original-height="1122" data-original-width="1263" height="568" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEip0KjsQkg1BF8H9Wo8GE6fRJy4D_hWKLk7bM_Z4TCe9H4IWM4EKlGSoljSakB4oCiYQkGxrIhBWEBpyoO55vHWW9h4rHAmU7yklU80zfb0mxQjjoHvSIwAFLf4kpWq1M_muMOvYr8AKzTX-SJyPOV9oTYknH2ThGVrme4LmdBQDeLib5Ad5g/w640-h568/Screenshot_20230312_085210.png" width="640" /></a></div><br /><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In this case, the amino acid change is right next to the ligand. Ammar selected them as such. Of course, biology in reality is much more complex. And maybe the differences we see here are not even significant compared to other effects.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But one thing keeps wondering, and I hope someone can explain this to me, in the past I would see experimental data on ligand-protein binding referring to the protein, but not so much the protein variant. We would need a lot of experimental measurements of ligands binding to protein variants to validate this.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">But all this uncertainty of the biological and drug discovery implications, there is another reason why I am really happy about this story. First, the openness and the ability to share it FAIR-ly online (check his use of w3id, e.g. <span style="text-align: left;"><a href="https://w3id.org/psnpbind/protein/2c3i">https://w3id.org/psnpbind/protein/2c3i</a>)</span>, and, second, the notion that we can do things now at this scale. With all the deep learning discussions ongoing, the ability to inspect in detail what these models do, how they behave, the "explainable AI" if you like, is essential and Ammar showed here how to do that.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Thinking back about the study about pKa's of warfarin tautomers, being all over the place from very basic to very acidic, it is nice to see some data on the effect of the SNPs on the binding affinities. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">I am sure you have some thoughts on this work. We did ask someone about the idea before we started, and we were told it had limited use. Use the comment section, or better even, write a reply blog post on your own platform, or send us an email. Looking forward to hearing from you.</div><p></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-60356991523305052822023-02-19T09:19:00.001+01:002023-02-19T09:19:13.329+01:00Why I free up time to give lectures (and about ChatGPT)<p style="text-align: justify;">This week a colleague whom I highly respect asked me if I was already so busy (regularly close to overworked), why did I give talks and often free up my time for that. A valid question. The <a href="https://www.youtube.com/watch?v=UBVV8pch1dM">Drew-reaction</a> here is to say "it is part of scientific communication and dissemination". But does that hold when writing deliverables (also communication and dissemination) should take priority?</p><p style="text-align: justify;">So, here's my Gun-reaction. I think there are two aspects I take into account on top of the "this is what scholars do" and "I learned it like this": the need for debate, the need for human collaboration. Arguably, these are the same thing, but intuitively I think the first is actually more about deepening our understanding, while the second is more about gratification. Interestingly, the first is more about Gun while the second is more about Drew. The second is why so many people like ChatGPT, the immediate gratification: it fills our immediate needs for facts. ChatGPT is however Drew, not Gun: it associates and does not reason.</p><p style="text-align: justify;">So, how about the debate. Reading science books, watching Veritasium, these are communication and an attempt at dissemination. But without the sparring, without the debate. And we know from theory that importance of saying out loud what you think you know. Think Feynmann's claims about teaching.</p><p style="text-align: justify;">Interestingly, this is why I enjoy data curation: it requires me to teach others what I think I know. Annoyingly, it also makes me very aware of the tiniest mistakes people make. This has helped me (somewhat) as editor, but at the same time found this very tiresome and frequently depressing.</p><p style="text-align: justify;">That brings me back to the giving of lectures and presentations. If I do my job well, I will get questions. I will be challenged and demands me to activate my knowledge. This, of course, is the scientific debate. This is why there is so much to say for open peer review of journal articles. It does wonders with peer reviewing open source (we have been employing peer review in the <a href="https://cdk.github.io/">Chemistry Development Kit</a> for almost two decades now).</p><p style="text-align: justify;">A lecture, a talk, it is for me an essential part of ensuring the quality of my research. Explaining to others what I know is part of my research. Absolutely worth making time for.</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-28065055797633851952023-02-11T09:24:00.001+01:002023-02-11T09:26:32.611+01:00BridgeDb NWO grant update #6: second hackathon<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDL4VMBhZaXNwBb-nm3KjX5mXtHZs7zi_XtsvpnzLIRRFfvlN5wMSr83RURNZOUbM94DNrL4WRfOfNCREM-0ZM9jyVmZ9WCHovjK9EyN_pnJO81_YRANO8xqSi6zhg4AKS7SA3VnkVgmn-RJDA7bmztbaqEoXMDnAIe4wneFdUhmn7hSwO2A/s930/Screenshot_20230211_092509.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img alt="Screenshot of the GitHub Action history showing the history of Docker generation processes." border="0" data-original-height="814" data-original-width="930" height="280" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDL4VMBhZaXNwBb-nm3KjX5mXtHZs7zi_XtsvpnzLIRRFfvlN5wMSr83RURNZOUbM94DNrL4WRfOfNCREM-0ZM9jyVmZ9WCHovjK9EyN_pnJO81_YRANO8xqSi6zhg4AKS7SA3VnkVgmn-RJDA7bmztbaqEoXMDnAIe4wneFdUhmn7hSwO2A/w320-h280/Screenshot_20230211_092509.png" width="320" /></a></div>This week the <a href="https://github.com/bridgedb/nwo-hackathon-2023">2nd NWO Open Science BridgeDb grant hackathon</a> took place. In all honestly, I had hoped we could open it up to a much larger community, but in our defense, the grant team is small, and we were flooded with various viruses in The Netherlands. Second, we need to get a lot if community feedback on additionally needed identifier mapping needs, except for support for <a href="https://doi.org/10.1093/database/baac035">Simple Standard for Sharing Ontological Mappings (SSSOM)</a>. This needs, however, more coding and we do not have the resources for that right now. Nevertheless, we had a great hackathon with people involved in the grant, including several people from other projects (aka "matching").<p></p><p><b>Projects</b></p><p>Before the meeting, several project ideas were written down, mostly related to remaining open tasks of the grant proposal (see <a href="https://chem-bla-ics.blogspot.com/2022/12/bridgedb-nwo-grant-update-5.html">Update #5</a>). On the first day, the <a href="https://github.com/bridgedb/nwo-hackathon-2023/issues/1">BridgeDb 3 Docker</a> and <a href="https://github.com/bridgedb/nwo-hackathon-2023/issues/2">BridgeDb Webservice JSON</a> support were merged, which actually made sense. The work of Helena, Marvin, Ozan, and Javi paid off. The new docker is on <a href="https://hub.docker.com/r/bigcatum/bridgedb">DockerHub</a>, automatically made with GitHub Actions (see top right screenshot). The overcame multiple small issues, like CORS support, port matching, dynamic configuration, etc. But this Docker should be easily deployable and allow projects like <a href="https://vhp4safety.nl/">VHP4Safety</a> and <a href="https://eosc.eu/">EOSC</a> automagically keep up with the latest BridgeDb software and data.</p><p><a href="https://github.com/bridgedb/nwo-hackathon-2023/issues/3">Other</a> <a href="https://github.com/bridgedb/nwo-hackathon-2023/issues/4">projects</a> focused on ID mapping databases. Myself, I worked on the first nanomaterial ID mapping database, an idea that was first pitch back in 2013 in the eNanoMapper proposal. So far, there was so little data and databases around, the ID mapping was never really needed. For this, updates were needed new releases in <a href="https://doi.org/10.5281/zenodo.7622139">BridgeDb Datasources</a> and <a href="https://zenodo.org/record/7622867">BridgeDb Java</a>. This is, fortunately, changing. Along the process Tooba and Ammar worked out <a href="https://github.com/bridgedb/nwo-hackathon-2023/issues/5#issuecomment-1422416092">a short recipe</a> how to inspect the content of BridgeDb ID mapping databases, which technically are Apache Derby files.</p><p>At the end of the meeting, I updated the <a href="https://www.bioconductor.org/packages/devel/bioc/html/BridgeDbR.html">BridgeDbR package</a> (2.9.1) with the latest Java libraries and looking into the technical possibility of a PathVisio3 release with the latest BridgeDb. But we really need a <i>NWO Open Science</i> or <i>eScience Center</i> grant for PathVisio to continue the work started by the COVID19 ZonMw grant. <b>Funders that want to support important life sciences research are strongly encouraged to contact us and help us write the grant proposal that they want to fund.</b></p><p><b>Next</b></p><p>The grant funding is about the run out and its contribution to our research software position is too. As such, our focus is now going to be on the writing of the final reporting. This hackathon greatly contributed to the results and it was a wise decision to include those in the grant proposal.</p><p><b>Previous updates</b></p><p></p><ul style="text-align: left;"><li><a href="https://chem-bla-ics.blogspot.com/2022/12/bridgedb-nwo-grant-update-5.html">BridgeDb NWO grant update #5: BioHackathon, Webservice, Bioregistry</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/07/bridgedb-nwo-grant-update-4.html">BridgeDb NWO grant update #4: Bioregistry.io and a BioSB workshop</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/05/bridgedb-nwo-grant-update-3-pandoras-box.html">BridgeDb NWO grant update #3: Pandora's box</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/04/bridgedb-nwo-grant-update-2-building-up.html">BridgeDb NWO grant update #2: building up momentum</a></li><li><a href="https://chem-bla-ics.blogspot.com/2022/03/bridgedb-nwo-grant-update-1-first-steps.html">BridgeDb NWO grant update #1: first steps</a></li></ul><p></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-27879428166051840422023-02-05T11:01:00.004+01:002023-02-05T11:01:32.919+01:00Citation Typing: progress but we need more uptake<p>It is now almost thirteen years ago that Prof. Shotton wrote their article about CiTO, the Citation Typing Ontology (doi:<a href="https://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-1-S1-S6">10.1186/2041-1480-1-S1-S6</a>). For long it was the only article with CiTO annotations in the article itself, explaining why the authors cited those articles, here reference 8 from Shotton's article:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmnndNL7LUUP3e91yIpibX7uy5XzfXsa3ax-6j0sUEDJ5RlMP_oYvJZMES2fXN4y6Mh95ooYtYPWlU_wTxzniCoLaq4uVpixfs5ye7KFfaq6Sk3BpK1jU6s1Fc2vJQY1BXuMajpEgXW3TbN672pfPjKs_zY8UHugW7r6JnOveUfuZmEEx-nA/s766/Screenshot_20230205_102623.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="128" data-original-width="766" height="106" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmnndNL7LUUP3e91yIpibX7uy5XzfXsa3ax-6j0sUEDJ5RlMP_oYvJZMES2fXN4y6Mh95ooYtYPWlU_wTxzniCoLaq4uVpixfs5ye7KFfaq6Sk3BpK1jU6s1Fc2vJQY1BXuMajpEgXW3TbN672pfPjKs_zY8UHugW7r6JnOveUfuZmEEx-nA/w640-h106/Screenshot_20230205_102623.png" width="640" /></a></div><p>I <a href="https://chem-bla-ics.blogspot.com/2010/09/list-of-things-i-miss-in-citeulike.html?q=cito">wanted this</a>. I was <a href="https://chem-bla-ics.blogspot.com/2015/05/cdk-literature-6.html?q=cito">collecting reasons why people were citing</a> the <a href="https://cdk.github.io/">Chemistry Development Kit</a> articles. I started using it, <a href="https://chem-bla-ics.blogspot.com/2012/02/cito-citeulike-publishing-innovation.html?q=cito">CiteULike added support</a>. Sadly, CiteULike got shut down at some point.</p><p>Fast forward to 2020, we started a Pilot in the <a href="https://jcheminf.biomedcentral.com/">Journal of Cheminformatics</a> to allow authors to annotate their citations as in the above reference 8 with a compact notation (doi:<a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00448-1">10.1186/s13321-020-00448-1</a>). I have been collecting these explicit CiTO annotations (unlike the post-publication annotations I collected in CiteULike) in Wikidata and <a href="https://scholia.toolforge.org/cito/">summarized in Scholia</a>, and this is what it looks in Wikidata for an article: </p><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuV87PyW1yh8mxSaZF-zV6NnxiZq6LQO1xt1EjLkG_3Q3iWD3YHEZTkfhORb4vKhuVXeVWT8hjU3nGLs-wssbSkyLKObN_HEGyHTr5FCy0lgjqv3rOGKLziDeHtQKJ-dq-e8oRW2lh3s010c9gR_q_e3sGzj0QgE4NMibl5eOOIzoKzIE4vA/s685/13321_2023_683_Fig1_HTML.png" imageanchor="1"><img border="0" data-original-height="394" data-original-width="685" height="368" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuV87PyW1yh8mxSaZF-zV6NnxiZq6LQO1xt1EjLkG_3Q3iWD3YHEZTkfhORb4vKhuVXeVWT8hjU3nGLs-wssbSkyLKObN_HEGyHTr5FCy0lgjqv3rOGKLziDeHtQKJ-dq-e8oRW2lh3s010c9gR_q_e3sGzj0QgE4NMibl5eOOIzoKzIE4vA/w640-h368/13321_2023_683_Fig1_HTML.png" width="640" /></a></div><p>This two year Pilot has now been concluded (doi:<a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00684-1">10.1186/s13321-023-00684-1</a>) and I wrote a commentary on how authors used it during these two years: <i>"Two years of explicit CiTO annotations"</i> (doi:<a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00683-2">10.1186/s13321-023-00683-2</a>). I am happy to see authors continue to annotate their article! This below histogram shows the number of articles per year with explicit annotation; besides the <i>Journal of Cheminformatics</i>, you can find additional article on two preprint repositories!</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwrS3YKVVIyRykKLxinWhEBwHcsaW0tiDNU58T-ZAhcS26VYcG1hCu2Gnk-mLQQCjahQdfsXw2hwuV_sX3d_Wm7Ww7MSdkmWf24X7wWuaKK9UqyjolWTdfjRuqqgHRPEbxDBs36Mzu4dFdIEZji7OB2xCvrzHjCrZ5erxsqoVqXGNI8sqypg/s1137/Screenshot_20230205_104029.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="614" data-original-width="1137" height="346" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjwrS3YKVVIyRykKLxinWhEBwHcsaW0tiDNU58T-ZAhcS26VYcG1hCu2Gnk-mLQQCjahQdfsXw2hwuV_sX3d_Wm7Ww7MSdkmWf24X7wWuaKK9UqyjolWTdfjRuqqgHRPEbxDBs36Mzu4dFdIEZji7OB2xCvrzHjCrZ5erxsqoVqXGNI8sqypg/w640-h346/Screenshot_20230205_104029.png" width="640" /></a></div><br /><p>Mind you, I know there are already <a href="https://biohackrxiv.org/">BioHackrXiv</a> preprints with CiTO annotation in 2023, but I am not keen on putting preprints in Wikidata. I could know, because one is the preprint describing CiTO support in BioHackrXiv (doi:<a href="https://doi.org/10.37044/osf.io/6rjvc">10.37044/osf.io/6rjvc</a>):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiai0dhTTqGuhDC7vdNkbRtWllG5nq30MRHk8qeRuE5lbb0RhvuKdQoAU60PKOodDCZeIxn567URDSsWuQ0uBX3sXRM6vJYT-Ygp6tChTLJUYr4ttk7xVEDtOl5qeFFXGp1rb4V-Fw_oxRWHnFT3m3xAPf65E3Kb31Kyys6QZ_hmz0qT4qJ2A/s1560/Screenshot_20230205_104638.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="960" data-original-width="1560" height="394" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiai0dhTTqGuhDC7vdNkbRtWllG5nq30MRHk8qeRuE5lbb0RhvuKdQoAU60PKOodDCZeIxn567URDSsWuQ0uBX3sXRM6vJYT-Ygp6tChTLJUYr4ttk7xVEDtOl5qeFFXGp1rb4V-Fw_oxRWHnFT3m3xAPf65E3Kb31Kyys6QZ_hmz0qT4qJ2A/w640-h394/Screenshot_20230205_104638.png" width="640" /></a></div><br /><p>So, we are making progress, but a lot needs to happen. We need more journal editors to support CiTO annotation in submissions. For Springer Nature journals this is technically easy, but the (publisher) editors need to monitor the typesetting to ensure the <i>pubnotes</i> do not get lost.</p><p>What else? Well, we need databases like <a href="https://pubmed.ncbi.nlm.nih.gov/">PubMed</a>, <a href="https://europepmc.org/">EuropePMC</a> to support this too. We need some FAIR formats to support sharing post-publication CiTO annotation, like I used CiteULike for, but also done in literature studies, e.g. like <a href="https://scholia.toolforge.org/work/Q116677084">this paper</a> by Duca <i>et al.</i></p><p>And we need support in tools like <a href="https://zotero.org/">Zotero</a> and EndNote. This is actually non-trivial, because the CiTO annotation is linked to the <i>citation</i> not to the bibliographic information in the tool. So, it needs to be integrated at the level of the Word/Google Docs plugin.</p><p>I was also thinking that what I miss is an overview of datasets that use CiTO. Just <a href="https://scholia.toolforge.org/work/Q21198690#citations">the list of articles citing the original CiTO paper</a> does not seem to do justice to the use in database.</p><p>I have good hopes the story will continue. The wide adoption of Open Science has already taken more than two decades. I can wait a bit longer for wide adoption of CiTO.</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-66187621836357380352023-01-27T16:06:00.002+01:002023-01-27T16:24:56.837+01:00Scholia timeline<p></p><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyooy9Wr-r0zxHr_9CwZC-GpPn4uOSGzUremMWP142GZNP6GP0GOld1Esl2iaMOgCBa32kcVGo0j1kvJcL6WssqEmWq8P2i0OhUNNpmbAS2Hd2TtR1Zp2pYCoMPcmC5SlUCLaB_PePAgMMH83CtBc7WUeuvpEVo506KLfzeHXri6slJ6lKAQ/s4924/Scholia_work_profile_for_A_single_mutation_in_chikungunya_virus_affects_vector_specificity_and_epidemic_potential_-_screenshot_as_of_2018-09-04.png" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="4924" data-original-width="1908" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyooy9Wr-r0zxHr_9CwZC-GpPn4uOSGzUremMWP142GZNP6GP0GOld1Esl2iaMOgCBa32kcVGo0j1kvJcL6WssqEmWq8P2i0OhUNNpmbAS2Hd2TtR1Zp2pYCoMPcmC5SlUCLaB_PePAgMMH83CtBc7WUeuvpEVo506KLfzeHXri6slJ6lKAQ/w248-h640/Scholia_work_profile_for_A_single_mutation_in_chikungunya_virus_affects_vector_specificity_and_epidemic_potential_-_screenshot_as_of_2018-09-04.png" width="248" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><a href="https://commons.wikimedia.org/wiki/File:Scholia_work_profile_for_A_single_mutation_in_chikungunya_virus_affects_vector_specificity_and_epidemic_potential_-_screenshot_as_of_2018-09-04.png">Source</a>.</td></tr></tbody></table>Sometimes I think back about how <a href="https://scholia.toolforge.org/">Scholia</a> started, and then I think I remember a Twitter discussion. Twitter was a social platform that was unable to fight hate speech. I left it last year in favor of <a href="https://scholar.social/@egonw">Mastodon</a>.<p></p><p>Anyway, I did some digging today and found <a href="https://twitter.com/fnielsen/status/785008295505489920">this thread</a> from October 8-9 2016. A few days earlier, Finn has created a profile based on data in Wikidata on his homepage, <a href="https://twitter.com/egonwillighagen/status/783190125882777600">which I was very happy about</a>. You can see how <a href="https://twitter.com/ReaderMeter/status/784810921029881856">Dario suggests</a> to put that webpage up on Toolforge. For completeness, this is <a href="https://github.com/WDscholia/scholia/commit/484104fdf60e4d8384b9816500f2826dbfe064ce.patch">the first commit</a>, October 9.</p><p>This chat was after <a href="https://fosstodon.org/@fnielsen">@fnielsen</a>'s <a href="https://finnaarupnielsen.wordpress.com/2016/09/30/the-wikidata-scholarly-profile-page/ ">blog post</a> about the idea of the needed open infrastructure and a possible <a href="https://wikidata.org/">Wikidata</a> solution from September 2016. Finally, it was also only half a year before Scholia got <a href="https://www.nature.com/articles/nature.2017.21800">mentioned in Nature</a>.</p><p>BTW, at the time there still was a focus on bibliographic information. We learned since that the Wikidata platform cannot technically meet the needs, at least not at this moment. Instead, the focus is now much more about the literature that supports the knowledge in Wikidata and Wikipedia and make that as interoperable as possible.</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-88441099007996082362023-01-15T12:47:00.010+01:002023-04-02T11:16:19.725+02:00Doing the "Open Science Challenge"<p></p><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDa0aniI2TIuXR9Rap9vX5XwuRrFVDf6P6sm-_p4pVuOt19l1iVvp_WRCKZx_uY0j4wQvnkCY79ucEAnhrjCAnIG4vIv18iey8XETUrmz5xxQMlaHoqtj0nvnOy_t34Xq6LUw2nQpSY4DfHB1ktfLP0Y-lCtSFgnLmOe-vfTuRmrDCj_ug6A/s844/Screenshot_20230115_122510.png" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="630" data-original-width="844" height="239" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDa0aniI2TIuXR9Rap9vX5XwuRrFVDf6P6sm-_p4pVuOt19l1iVvp_WRCKZx_uY0j4wQvnkCY79ucEAnhrjCAnIG4vIv18iey8XETUrmz5xxQMlaHoqtj0nvnOy_t34Xq6LUw2nQpSY4DfHB1ktfLP0Y-lCtSFgnLmOe-vfTuRmrDCj_ug6A/s320/Screenshot_20230115_122510.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Screenshot of the <a href="https://heidiseibold.ck.page/opensciencechallenge">sign up page</a>.</td></tr></tbody></table>Triggered by the "reflections on your career" in the announcement I decide to give the <i><a href="https://heidiseibold.ck.page/opensciencechallenge">Open Science Challenge</a></i> by <a href="https://fosstodon.org/@HeidiSeibold">Heidi Seibold</a> a try: "12 emails over the course of a month that are designed to help you on your Open Science journey."<p></p><p>I will post here my replies to the various challenges, by linking to the first Mastodon, allowing you to follow the replies:</p><p></p><ul style="text-align: left;"><li>Day 1: <a href="https://akademienl.social/@egonw/109670641195409165">Why am I participating</a></li><li>Day 2: <a href="https://akademienl.social/@egonw/109680882466924235">Your Open Science peers</a></li><li>Day 3: <a href="https://akademienl.social/@egonw/109692571311027920">Write down all of your projects</a> and <a href="https://akademienl.social/@egonw/109692658433491610">put them in a (im)portant/(un)passionate matrix</a></li><li>Day 4: <a href="https://akademienl.social/@egonw/109725863715473896">Stop working on your CV</a></li><li>Day 5: <a href="https://akademienl.social/@egonw/109731992239207995">Open Materials</a></li><li>Day 6: <a href="https://akademienl.social/@egonw/109771672138102276">Open Code</a></li><li>Day 7: <a href="https://akademienl.social/@egonw/109891076560277054">Mindsets that hold you back</a></li><li>Day 8: <a href="https://akademienl.social/@egonw/109930364875787030">Science Communication</a></li><li>Day 9: <a href="https://akademienl.social/@egonw/109970899019771144">Social Change</a></li><li>Day 10: <a href="https://akademienl.social/@egonw/110010850838483320">Open Access</a></li><li>Day 11: <a href="https://akademienl.social/@egonw/110044760451238819">Ethics and Research</a></li><li>Day 12: <a href="https://akademienl.social/@egonw/110128344460387409">Wrap up</a></li></ul><p></p><p><br /></p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-71066370785377255542022-12-25T10:51:00.003+01:002022-12-25T10:51:28.090+01:00Paper: "Guiding the choice of informatics software and tools for lipidomics research applications"<p><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-wYPOwVfrWd303PGEO9dV_n81FiMlbUS22tZ8SWzrB0AFNFucg1iWa6WA2T4WNjoLfWnIK3iUpHraXbz99tRAp3fbuS9ABjh0S9avanpykxkCYyTcHTP3mUiZLQS_ohTTU7eX3ajLBpjswJUwEzA_7hyF32Ql19Pr-xJ3hEoiwGcOdo6GBw/s681/Screenshot_20221225_103520.png" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="522" data-original-width="681" height="245" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-wYPOwVfrWd303PGEO9dV_n81FiMlbUS22tZ8SWzrB0AFNFucg1iWa6WA2T4WNjoLfWnIK3iUpHraXbz99tRAp3fbuS9ABjh0S9avanpykxkCYyTcHTP3mUiZLQS_ohTTU7eX3ajLBpjswJUwEzA_7hyF32Ql19Pr-xJ3hEoiwGcOdo6GBw/s320/Screenshot_20221225_103520.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><br />Screenshot of <a href="https://www.lipidmaps.org/resources/tools?page=flow_chart">this LIPID MAPS webpage</a>.</td></tr></tbody></table>One of the outcomes of the <a href="https://www.epilipid.net/">EpiLipidNET</a> COST action is a paper about the data analysis of experimental lipidomics data: <i>Guiding the choice of informatics software and tools for lipidomics research applications</i> (doi:<a href="https://doi.org/10.1038/s41592-022-01710-0">10.1038/s41592-022-01710-0</a>).</p><p>Our BiGCaT team wrote up <a href="https://bridgedb.github.io/">BridgeDb</a> for identifier mapping and <a href="https://wikipathways.org/">WikiPathways</a> for pathways/enrichment analysis. See also the <a href="https://lipids.wikipathways.org/">WikiPathways Lipids Portal</a>.</p><p>But I also wanted to map the tools from the article to <a href="https://elixir-europe.org/">ELIXIR</a> databases, particularly <a href="https://fairsharing.org/">FAIRsharing</a>, <a href="http://bio.tools">bio.tools</a>, and <a href="https://tess.elixir-europe.org/">TeSS</a>. I wish journals would just require this as part of the wish to make science more FAIR. While at it, I realized I could also add Wikidata item annotations and link to <a href="https://scholia.toolforge.org/">Scholia</a> (see also <a href="https://chem-bla-ics.blogspot.com/search?q=scholia">these blog posts</a>). And while add it, I improved the links between the items on the software with the journal articles describing the software and tools, including the citation networks.</p><p>I know it takes time and I would have loved to have this curation done before the publication. But I couldn't. But I just started adding <a href="https://github.com/egonw/lipidomics/blob/main/elixir_mappings.csv">the annotation in this GitHub repository</a>:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://github.com/egonw/lipidomics" imageanchor="1" style="margin-left: 1em; margin-right: 1em;" target="_blank"><img border="0" data-original-height="1170" data-original-width="1205" height="622" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxQf6Ccz5kUiZCsKntoKexXJOWfaRy_E2dEMYmbVjJijRdKPuvmW5HCSsaDBrlO6IeZYvdjG_oVTBKwKOgnjpd8XOYD_Agf25rC5KyqTRp32E9hMi8OaydNmulfo16fVLbZhCxfML0_Z42GmY4dZCsRS2ha7lEA44_HF2helehae606ll_8Q/w640-h622/Screenshot_20221225_104841.png" width="640" /></a></div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-49503301250975990132022-12-11T08:43:00.003+01:002022-12-11T08:43:24.956+01:00BridgeDb NWO grant update #5: BioHackathon, Webservice, Bioregistry<p style="text-align: justify;"><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right;"><tbody><tr><td style="text-align: center;"><a href="https://fosstodon.org/@bridgedb" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;" target="_blank"><img border="0" data-original-height="491" data-original-width="321" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9sE-sBrH3AQ4-lMsJ61uG8nLKN9-5GGv8WWAaJ6W_HRJ-bUYd3OZthnqFzY9rKXx43SkXWWuOO2e-FYNmx4LwWcw_61p2CVniHYTsni_CIm4uLYsOLfNcP8VrFdHEb4QWnT6DTTn4lKLJgoyYDgevoyTinq9baWuoky4Tb4mecHvHvswA3w/s320/Screenshot_20221211_084226.png" width="209" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Our new Mastodon account.</td></tr></tbody></table>So, I had a lot of teaching and that besides project deliverables and final reports, a few project meetings, it left me with little time to blog my monthly <i>BridgeDb NWO grant update</i>. But here goes, as a lot did happen in the background.</p><p style="text-align: justify;">First, some outreach:</p><p></p><ul style="text-align: left;"><li style="text-align: left;">2022-09-15, <a href="https://www.nwo.nl/en/open-science-practice-webinar-series">Open Science in Practice Webinar Series</a>: <a href="https://zenodo.org/record/7115296">BridgeDb and Wikidata: a powerful combination generating interoperable open research</a> (<a href="https://www.youtube.com/watch?v=dB1QFl9t5cQ">video</a>)</li><li style="text-align: left;">2022-10-20, UM Data Science Research Seminar: <a href="https://zenodo.org/record/7316412">Making research output FAIR with Wikidata</a></li></ul><p></p><p style="text-align: justify;"><b>Work Package 1</b></p><p style="text-align: justify;">In WP1 we continued working on the core BridgeDb library (after we split out the BridgeDb Webservice into a separate repository). Last time we reported about <a href="https://chem-bla-ics.blogspot.com/2022/07/bridgedb-nwo-grant-update-4.html">Bioregistry support</a> (used on the new WikiPathways website). I am happy this paper now <a href="https://chem-bla-ics.blogspot.com/2022/12/paper-unifying-identification-of.html">got published</a>. BridgeDb Java <a href="https://github.com/bridgedb/BridgeDb/releases/tag/release_3.0.16">3.0.16</a>, <a href="https://github.com/bridgedb/BridgeDb/releases/tag/release_3.0.17">3.0.17</a>, and <a href="https://github.com/bridgedb/BridgeDb/releases/tag/release_3.0.18">3.0.18</a> have been released. No big changes, but mostly additional features for <a href="https://new.wikipathways.org/">WikiPathways</a> and the upcoming <a href="https://github.com/PathVisio/libGPML/">libGPML</a> and <a href="https://github.com/PathVisio/pathvisio4-ant">PathVisio 4.0</a>. The latest release also comes with an <a href="https://zenodo.org/record/7412853">updated BridgeDb Datasources</a>.</p><p style="text-align: justify;"><b>Work Package 2</b></p><p style="text-align: justify;">This is where the most work happened in the last few months. Helena has been working on the <a href="https://github.com/bridgedb/bridgedb-webservice">BridgeDb Webservice code</a>. This code was 10 years old and desperately needed an upgrade. In <a href="https://riojournal.com/article/83031/instance/7705342/">the proposal</a> we mention JSON, compact identifiers (or the CURIEs from Bioregistry), and more FAIRness. After a few weeks of learning the used REST library and hacking, Helena got content negotiation working and we are happy to report that the upcoming release will support JSON. Even better, it also solves the problem we had with the Docker image.</p><p style="text-align: justify;"><b>Work Package 3</b></p><p style="text-align: justify;">Mapping databases continue to be updated. For the metabolite identifier mapping database, Denise has been looking into wrapping this in a Docker image and all future mapping databases will use schema 4 of the BridgeDb database schema (support primary/secondary identifier annotation). Denise, Martina, Tooba, and I participated in the <a href="https://elixir-europe.org/events/biohackathon-europe-2022">ELIXIR BioHackathon Europe</a> where we worked on several projects. <a href="https://github.com/elixir-europe/biohackathon-projects-2022/tree/main/26">Project 26</a> looked into more identifier mapping with Wikidata and PubChem and improving interoperability of the Bioschemas MolecularEntity profile. We also spoke with the <a href="https://academic.oup.com/bioinformatics/article/38/17/4194/6633929">TogoID</a> team from Japan on interoperability and possibilities of collaboration.</p><p style="text-align: justify;"><b>Next</b></p><p style="text-align: justify;">With a few months left until the end of the project, we are going to focus on wrapping up the progress. The webservice needs an updated OpenAPI documentation and a proper release and documentation how anyone can run a local BridgeDb Webservice easily (and we can update the <a href="https://eosc.eu/">EOSC</a> instance). We also have a (ELIXIR) stakeholder-oriented workshop to organize.</p><p style="text-align: justify;"><span style="text-align: left;">Oh, and we created a Mastodon account: </span><a href="https://fosstodon.org/@bridgedb" style="text-align: left;">@bridgedb@fosstodon.org</a><span style="text-align: left;">!</span></p><div style="text-align: justify;"><b>Previous updates</b></div><div><ul><li style="text-align: left;"><a href="https://chem-bla-ics.blogspot.com/2022/07/bridgedb-nwo-grant-update-4.html">BridgeDb NWO grant update #4: Bioregistry.io and a BioSB workshop</a></li><li style="text-align: left;"><a href="https://chem-bla-ics.blogspot.com/2022/05/bridgedb-nwo-grant-update-3-pandoras-box.html">BridgeDb NWO grant update #3: Pandora's box</a></li><li style="text-align: left;"><a href="https://chem-bla-ics.blogspot.com/2022/04/bridgedb-nwo-grant-update-2-building-up.html">BridgeDb NWO grant update #2: building up momentum</a></li><li style="text-align: left;"><a href="https://chem-bla-ics.blogspot.com/2022/03/bridgedb-nwo-grant-update-1-first-steps.html">BridgeDb NWO grant update #1: first steps</a></li></ul></div>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0tag:blogger.com,1999:blog-17889588.post-54360540408784568942022-12-05T08:33:00.001+01:002022-12-05T08:39:39.271+01:00Paper: "Unifying the identification of biomedical entities with the Bioregistry"<p></p><table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQf8hAflJQIq47F_qf1hdYBkYLHKpM8IVwu95Pa1Z-b4SzaTBMzD1baMuYYts52CeFafknAn8jQcKCRg7OYvx0w9mz_cQU7Z098996LCSbwDlgMruXqEVNhwXYTSuby4db4Vta3x0LO2KezJBGAMI7TMAXA3qzXUbe8ajhlsYsQgcvSh4eYQ/s685/41597_2022_1807_Fig2_HTML.webp" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="579" data-original-width="685" height="270" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiQf8hAflJQIq47F_qf1hdYBkYLHKpM8IVwu95Pa1Z-b4SzaTBMzD1baMuYYts52CeFafknAn8jQcKCRg7OYvx0w9mz_cQU7Z098996LCSbwDlgMruXqEVNhwXYTSuby4db4Vta3x0LO2KezJBGAMI7TMAXA3qzXUbe8ajhlsYsQgcvSh4eYQ/s320/41597_2022_1807_Fig2_HTML.webp" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Figure 2 from the article, showing Bioregistry<br />website screenshots. CC-BY.</td></tr></tbody></table>Identifiers are central to FAIR. Our <a href="https://www.maastrichtuniversity.nl/research/bioinformatics">BiGCaT research group</a> (see also <a href="https://scholia.toolforge.org/organization/Q19845644">this Scholia page</a>) studies how to answer biological questions and identifiers are then essential to integrate experimental data (e.g. omics data) with existing knowledge (e.g. <a href="https://new.wikipathways.org/">biological pathway database</a>s). <a href="https://bridgedb.github.io/">BridgeDb</a> is our go to tool here, of course.<p></p><p>BridgeDb has since Open PHACTS (doi:<a href="https://doi.org/10.1016/j.drudis.2012.05.016">10.1016/j.drudis.2012.05.016</a>) support for identifiers.org (doi:<a href="https://doi.org/10.1093/bioinformatics/btaa864">10.1093/bioinformatics/btaa864</a>). This support is also used in the <a href="https://rdf.wikipathways.org/">WikiPathways RDF</a> at this moment (doi:<a href="https://doi.org/10.1371/journal.pcbi.1004989">10.1371/journal.pcbi.1004989</a>).</p><p><a href="https://bioregistry.io/">Bioregistry</a> was recently started as complementary to identifiers.org. Fully <a href="https://github.com/biopragmatics/bioregistry">open source</a>, GitHub-hosted, automated, modern support for <a href="https://www.w3.org/TR/2010/NOTE-curie-20101216/">CURIE</a>s making it integrate with RDF better, etc. And run like a open science project, and not just an open source project. Oh, and it integrates ontology support. We already used it for the <a href="https://nanocommons.github.io/identifiers/">European Registry of Materials</a> identifier (see <a href="https://chem-bla-ics.blogspot.com/2022/09/nanomaterial-identifiers-erm-identifier.html">this post</a>, doi:<a href="http://10.1186/s13321-022-00614-7">10.1186/s13321-022-00614-7</a>).</p><p>Because identifers.org and Bioregistry work slightly differently, Helena and I added complementary support for Bioregistry to BridgeDb, which we are already using on the <a href="https://new.wikipathways.org/">new WikiPathways website</a>. The ball actually got rolling because <a href="https://github.com/cthoyt">Charles</a> actually submitted a <a href="https://github.com/bridgedb/datasources/pull/27">patch to add Bioregistry prefixes</a>. This means that you can convert CURIEs to full IRIs, to identifiers.org URLs, etc. You can even use this from <a href="https://github.com/egonw/bacting">Bacting</a> (doi:<a href="https://doi.org/10.21105/joss.02558">10.21105/joss.02558</a>) and <a href="https://pypi.org/project/pybacting/">pybacting</a>.</p><p>Anyway, the Bioregistry project published a paper and I am happy to have been part of it: <i>Unifying the identification of biomedical entities with the Bioregistry</i> (doi:<a href="https://doi.org/10.1038/s41597-022-01807-3">10.1038/s41597-022-01807-3</a>).</p>Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.com0