pull down to refresh
reply
Most people are shitcoiners - this shouldn't be surprising
However, the entire sci-hub collection is available through torrents. This can be reproduced, shared, published:
- Get a huge drive
- Iterate through 100TB of libgen torrents
- Normalize everything into markdown -> share this
- Index it all -> share this
- Train a small model on searching the index (with structured output) -> share this
- Inject the found references to a big model and let it reason -> share this
reply
Looks like a fun little (hmm) side project :)
Let us know how this works for you!
Also found the blog post describing what's what on https://sci-hub.box