pull down to refresh

Will test it in the next few days revising a student's draft, comparing its output to my usual paid ChatGPT output...

Let us know how this works for you!


Also found the blog post describing what's what on https://sci-hub.box

Hear the good news: recent advances in artificial intelligence enabled Sci-Hub to launch a robot that gives scientifically-grounded responses to questions. The robot starts with searching for relevant literature in Sci-Hub database, then turns to selecting and reading most recent studies, and composes the answer based on this information. The answer includes all the references, and each referenced article can be read on Sci-Hub with one click.

Unlike question-answering robots that were based upon the early generation of neural networks, Sci-Hub bot does not hallucinate and is not making up scientific facts and does not cite sources that do not exist. To support its statements, Sci-Bot uses articles from Sci-Hub database. Questions can be asked in any language, and answers can be saved on server and shared.

The alpha version only supports answerig one question, and a more advanced variation that supports conversation mode is coming soon. Right column displays example questions that has been answered by robot - push the question to see the generated answer.
reply

That's useful context.

Also

I forgot she's a bit of a shitcoiner... #981532

reply

Most people are shitcoiners - this shouldn't be surprising

However, the entire sci-hub collection is available through torrents. This can be reproduced, shared, published:

  1. Get a huge drive
  2. Iterate through 100TB of libgen torrents
  3. Normalize everything into markdown -> share this
  4. Index it all -> share this
  5. Train a small model on searching the index (with structured output) -> share this
  6. Inject the found references to a big model and let it reason -> share this
reply

Looks like a fun little (hmm) side project :)

reply

I'm quite sure that this is one of the things you can let an LLM build for you. Run it against a small test corpus, check results, improve, repeat.

Doesn't have to take a lot of your time, mostly wall time, gpu ticks, cpu ticks, network.

reply

Any sense of how much fraud is in there?

I mostly followed the Replication Crisis in the social sciences but I definitely recall hearing about lots of problems on the physical sciences side.

reply

Lots of crappy science, including in the physical sciences.

I would not go as far as call it fraud. Fraud would imply intent.
Stupidity yes; malice, mostly, no. Most people don't know how to perform a proper statistical analysis. They don't know they don't know.

reply

Stupidity is probably the preferable option. You at least wouldn’t expect the errors to tend in a particular direction.

I think medical science had one of the higher fraud rates. The most glaring strategy that I recall was authors basically just taking a previously published study and swapping the disease of interest in the text.

reply