The most bandwidth-efficient way of obtaining historical gasPrice's per tx

I would like to obtain all gasPrice's used by all transactions in all blocks. Is there a more bandwidth-efficient way than the naive approach of:

for each block B since genesis: # hitting infura to get next block
     for each tx in B: # hitting infura for next tx
         extract tx.gasPrice

Thank you.

Hi @aliatiia - welcome to the Infura community!

We’re working on checking into this, but in the meantime, what are you trying to achieve with this data?

Thanks, I am doing some EIP1559-related analysis and would like to backtest a simulation that determines whether a hypothetical tx with a certain gasPrice would have been included into a block with high probability given the gasPrice’s of other real tx’s in said block.

It doesn’t sound like there’s any metadata that would allow you to compute this more efficiently. However, you may be able to use sample gas prices over block ranges - though individual blocks can be outliers, the gas market itself doesn’t change dramatically block by block, so you would be able to compute a rough average gas price by doing a full sample of all the transactions every 100th block or something similar, if that will work for your purposes.

Sampling wouldn’t work unfortunately because it will affect the analysis, it has to be all transactions.

It is possible to obtain such data from Google Big Query but I ran out of quota pretty quickly, this is the query in case it’s useful to anyone:

SELECT block_number, gas_price FROM bigquery-public-data.crypto_ethereum.transactions ORDER BY block_number

Update: Transactions can be requested Using Ethereum-ETL using this command:

ethereumetl export_blocks_and_transactions --start-block STARTBLCOK --end-block ENDBLOCK --provider-uri https://mainnet.infura.io/v3/[PROJECT_ID] --blocks-output blocks.csv --transactions-output transactions.csv

You can remove --blocks-output blocks.csv if you only want transactions.

transactions.csv has these columns: hash,nonce,block_hash,block_number,transaction_index,from_address,to_address,value,gas,gas_price,input,block_timestamp.

Infura can save a lot of bandwidth if it supports granularity to getTransactionByBlockNumberAndIndex method by letting users specify the columns they needed thru an extra param (in my case for example I only want hash, gas_price).

Update: this EIP solves the issue generally